Python minidomを使用してXMLを読み取り、各ノードを反復する

Question

次のようなXML構造がありますが、はるかに大規模です。

<root> <conference name='1'> <author> Bob </author> <author> Nigel </author> </conference> <conference name='2'> <author> Alice </author> <author> Mary </author> </conference> </root>

これには、次のコードを使用しました。

dom = parse(filepath) conference=dom.getElementsByTagName('conference') for node in conference: conf_name=node.getAttribute('name') print conf_name alist=node.getElementsByTagName('author') for a in alist: authortext= a.nodeValue print authortext

ただし、印刷される著者テキストは「なし」です。以下のようなバリエーションをいじってみましたが、プログラムが壊れてしまいます。

authortext=a[0].nodeValue

正しい出力は次のようになります。

1 Bob Nigel 2 Alice Mary

しかし、私が得るものは：

1 None None 2 None None

この問題に取り組む方法について何か提案はありますか？

SilentGhost · Accepted Answer

authortextはタイプ1（ELEMENT_NODE）、通常はTEXT_NODE文字列を取得します。これは動作します

a.childNodes[0].nodeValue

bobince · Answer

要素ノードにはnodeValueがありません。それらの中のテキストノードを見る必要があります。内部に常に1つのテキストノードがあることがわかっている場合は、element.firstChild.dataと言うことができます（データはテキストノードのnodeValueと同じです）。

注意：テキストコンテンツがない場合、子のTextノードはなく、element.firstChildがnullになるため、.dataアクセスが失敗します。

直接の子テキストノードのコンテンツを取得する簡単な方法：

text= ''.join(child.data for child in element.childNodes if child.nodeType==child.TEXT_NODE)

DOM Level 3 Coreでは、要素内から再帰的にテキストを取得するために使用できるtextContentプロパティを取得しますが、minidomはこれをサポートしていません（他のPython DOM実装では、）。

Priyabrata · Answer

迅速なアクセス：

node.getElementsByTagName('author')[0].childNodes[0].nodeValue

Swadhikar C · Answer

著者ごとに常に1つのテキストデータ値があるため、element.firstChild.dataを使用できます。

dom = parseString(document) conferences = dom.getElementsByTagName("conference") # Each conference here is a node for conference in conferences: conference_name = conference.getAttribute("name") print print conference_name.upper() + " - " authors = conference.getElementsByTagName("author") for author in authors: print " ", author.firstChild.data # for print

Mark Rushakoff · Answer

私は少し遊んでみましたが、これが私が働かなければならないものです：

# ... authortext= a.childNodes[0].nodeValue print authortext

出力につながる：

 C：\ temp\py> xml2.py 1 ボブ ナイジェル 2 アリス メアリー

内側のテキストを取得するためにchildNodeにアクセスする必要がある理由を正確に説明することはできませんが、少なくともそれがあなたが探していたものです。