Python内のelementTreeで要素を検索して削除します

Question

一部の要素を検索するXMLドキュメントがあり、それらがいくつかの基準に一致する場合、それらを削除したい

ただし、要素の親にアクセスできず、削除できません。

file = open('test.xml', "r") elem = ElementTree.parse(file) namespace = "{http://somens}" props = elem.findall('.//{0}prop'.format(namespace)) for prop in props: type = prop.attrib.get('type', None) if type == 'json': value = json.loads(prop.attrib['value']) if value['name'] == 'Page1.Button1': #here I need to access the parent of prop # in order to delete the prop

これを行う方法はありますか？

ありがとう

Constantinius · Accepted Answer

子要素は、対応するremoveメソッドを使用して削除できます。要素を削除するには、その親のremoveメソッドを呼び出す必要があります。残念ながらElementはその親への参照を提供しないため、親/子関係を追跡するのはあなた次第です（これはelem.findall()の使用に反対するものです）。

提案されたソリューションは次のようになります。

_root = elem.getroot() for child in root: if child.name != "prop": continue if True:# TODO: do your check here! root.remove(child) _

PS：prop.attrib.get()を使用しないでください。説明に従って、prop.get()を使用してください here 。

kitsu.eb · Answer

Xpathを使用して、要素の親を選択できます。

file = open('test.xml', "r") elem = ElementTree.parse(file) namespace = "{http://somens}" props = elem.findall('.//{0}prop'.format(namespace)) for prop in props: type = prop.get('type', None) if type == 'json': value = json.loads(prop.attrib['value']) if value['name'] == 'Page1.Button1': # Get parent and remove this prop parent = prop.find("..") parent.remove(prop)

http://docs.python.org/2/library/xml.etree.elementtree.html#supported-xpath-syntax

例外それがうまくいかない場合： http://elmpowered.skawaii.net/?p=74

したがって、代わりに次のことを行う必要があります。

file = open('test.xml', "r") elem = ElementTree.parse(file) namespace = "{http://somens}" search = './/{0}prop'.format(namespace) # Use xpath to get all parents of props prop_parents = elem.findall(search + '/..') for parent in prop_parents: # Still have to find and iterate through child props for prop in parent.findall(search): type = prop.get('type', None) if type == 'json': value = json.loads(prop.attrib['value']) if value['name'] == 'Page1.Button1': parent.remove(prop)

2つの検索とネストされたループです。内部検索は、最初の子として小道具を含むことがわかっているElementでのみ行われますが、スキーマによってはそれほど意味がない場合があります。

iceblueorbitz · Answer

私はこれが古いスレッドであることを知っていますが、同様のタスクを理解しようとしている間、これはポップアップし続けました。次の2つの理由で、受け入れられた回答が気に入らなかった。

1）ネストされた複数のレベルのタグを処理しません。

2）複数のxmlタグが同じレベルで次々に削除されると、壊れます。各要素はElement._childrenのインデックスであるため、前方反復中に削除しないでください。

私はより良い多用途のソリューションはこれだと思います：

import xml.etree.ElementTree as et file = 'test.xml' tree = et.parse(file) root = tree.getroot() def iterator(parents, nested=False): for child in reversed(parents): if nested: if len(child) >= 1: iterator(child) if True: # Add your entire condition here parents.remove(child) iterator(root, nested=True)

OPの場合、これは機能するはずですが、完璧かどうかをテストするために使用しているデータがありません。

import xml.etree.ElementTree as et file = 'test.xml' tree = et.parse(file) namespace = "{http://somens}" props = tree.findall('.//{0}prop'.format(namespace)) def iterator(parents, nested=False): for child in reversed(parents): if nested: if len(child) >= 1: iterator(child) if prop.attrib.get('type') == 'json': value = json.loads(prop.attrib['value']) if value['name'] == 'Page1.Button1': parents.remove(child) iterator(props, nested=True)

engineer14 · Answer

すべての子供には親が必要であるという事実を利用して、@ kitsu.ebの例を簡略化します。 f findallコマンドを使用して子と親を取得すると、それらのインデックスは同等になります。

 file = open('test.xml', "r") elem = ElementTree.parse(file) namespace = "{http://somens}" search = './/{0}prop'.format(namespace) # Use xpath to get all parents of props prop_parents = elem.findall(search + '/..') props = elem.findall('.//{0}prop'.format(namespace)) for prop in props: type = prop.attrib.get('type', None) if type == 'json': value = json.loads(prop.attrib['value']) if value['name'] == 'Page1.Button1': #use the index of the current child to find #its parent and remove the child prop_parents[props.index[prop]].remove(prop)

chi · Answer

Lxmlモジュールを使用したソリューション

from lxml import etree root = ET.fromstring(xml_str) for e in root.findall('.//{http://some.name.space}node'): parent = e.getparent() for child in parent.find('./{http://some.name.space}node'): try: parent.remove(child) except ValueError: pass

Fredrik · Answer

この種のフィルタリングにはXPath式を使用するのが好きです。特に知らない限り、このような式はルートレベルで適用する必要があります。つまり、親を取得してその親に同じ式を適用することはできません。ただし、目的のノードがルートでない限り、サポートされているすべてのXPathで機能する優れた柔軟なソリューションがあるように思えます。それはこのようなものになります：

root = elem.getroot() # Find all nodes matching the filter string (flt) nodes = root.findall(flt) while len(nodes): # As long as there are nodes, there should be parents # Get the first of all parents to the found nodes parent = root.findall(flt+'/..')[0] # Use this parent to remove the first node parent.remove(nodes[0]) # Find all remaining nodes nodes = root.findall(flt)