ElementTreeで、XML要素に子があるかどうかを確認します

Question

この方法でXMLドキュメントを取得します。

import xml.etree.ElementTree as ET root = ET.parse(urllib2.urlopen(url)) for child in root.findall("item"): a1 = child[0].text # ok a2 = child[1].text # ok a3 = child[2].text # ok a4 = child[3].text # BOOM # ...

XMLは次のようになります。

<item> <a1>value1</a1> <a2>value2</a2> <a3>value3</a3> <a4> <a11>value222</a11> <a22>value22</a22> </a4> </item>

a4（この特定のケースでは、他の要素であった可能性があります）には子がありますか？

jlr · Accepted Answer

要素でlist関数を試すことができます。

>>> xml = """<item> <a1>value1</a1> <a2>value2</a2> <a3>value3</a3> <a4> <a11>value222</a11> <a22>value22</a22> </a4> </item>""" >>> root = ET.fromstring(xml) >>> list(root[0]) [] >>> list(root[3]) [<Element 'a11' at 0x2321e10>, <Element 'a22' at 0x2321e48>] >>> len(list(root[3])) 2 >>> print "has children" if len(list(root[3])) else "no child" has children >>> print "has children" if len(list(root[2])) else "no child" no child >>> # Or simpler, without a call to list within len, it also works: >>> print "has children" if len(root[3]) else "no child" has children

findallルートのitem関数呼び出しが機能しなかったため、サンプルを変更しました（findallは現在の要素ではなく直接の子孫を検索するため）。作業プログラムで後で子のテキストにアクセスする場合は、次のようにします。

for child in root.findall("item"): # if there are children, get their text content as well. if len(child): for subchild in child: subchild.text # else just get the current child text. else: child.text

ただし、これは再帰的に適しています。

Mad Physicist · Answer

私が見つけた最も簡単な方法は、要素のbool値を直接使用することです。つまり、a4条件文の現状のまま：

a4 = Element('a4') if a4: print('Has kids') else: print('No kids yet') a4.append(Element('x')) if a4: print('Has kids now') else: print('Still no kids')

このコードを実行すると印刷されます

No kids yet Has kids now

要素のブール値は、text、tailまたは属性について何も言いません。それは子供の存在または不在を示すだけであり、それは元の質問が尋ねていたものです。

marscher · Answer

要素クラスにはget childrenメソッドがあります。したがって、次のようなものを使用して、子が存在するかどうかを確認し、結果をkey = tag名でディクショナリに格納する必要があります。

result = {} for child in root.findall("item"): if child.getchildren() == []: result[child.tag] = child.text

roippi · Answer

個人的には、xpath式を完全にサポートするxmlパーサーを使用することをお勧めします。 xml.etreeによってサポートされるサブセットは、このようなタスクには不十分です。

たとえば、lxmlでは次のことができます。

「<item>ノードの子のすべての子をくれ」：

doc.xpath('//item/*/child::*') #equivalent to '//item/*/*', if you're being terse Out[18]: [<Element a11 at 0x7f60ec1c1348>, <Element a22 at 0x7f60ec1c1888>]

または、

「自分自身が子を持たない<item>」のすべての子をください」：

doc.xpath('/item/*[count(child::*) = 0]') Out[20]: [<Element a1 at 0x7f60ec1c1588>, <Element a2 at 0x7f60ec1c15c8>, <Element a3 at 0x7f60ec1c1608>]

または、

「子を持たない要素をすべてくれ」：

doc.xpath('//*[count(child::*) = 0]') Out[29]: [<Element a1 at 0x7f60ec1c1588>, <Element a2 at 0x7f60ec1c15c8>, <Element a3 at 0x7f60ec1c1608>, <Element a11 at 0x7f60ec1c1348>, <Element a22 at 0x7f60ec1c1888>] # and if I only care about the text from those nodes... doc.xpath('//*[count(child::*) = 0]/text()') Out[30]: ['value1', 'value2', 'value3', 'value222', 'value22']

David C&#243;rdoba Ruiz · Answer

Iterメソッドを使用できます

import xml.etree.ElementTree as ET etree = ET.parse('file.xml') root = etree.getroot() a = [] for child in root.iter(): if child.text: if len(child.text.split()) > 0: a.append(child.text) print(a)