XMLファイルを読み取り、その属性値をPython

Question

私はこのXMLファイルを持っています：

<domain type='kmc' id='007'> <name>virtual bug</name> <uuid>66523dfdf555dfd</uuid> <os> <type Arch='xintel' machine='ubuntu'>hvm</type> <boot dev='hd'/> <boot dev='cdrom'/> </os> <memory unit='KiB'>524288</memory> <currentMemory unit='KiB'>270336</currentMemory> <vcpu placement='static'>10</vcpu>

これを解析して、その属性値をフェッチします。たとえば、uuidフィールドをフェッチしたいとします。では、Pythonでそれをフェッチするための適切なメソッドは何でしょうか？

ron rothman · Accepted Answer

lxml スニペットは、attributeと要素text（あなたの質問はどちらが必要かについて少し曖昧だったので、私は両方を含めます）：

from lxml import etree doc = etree.parse(filename) memoryElem = doc.find('memory') print memoryElem.text # element text print memoryElem.get('unit') # attribute

あなたはminidom（ 2.x 、 .x ）は良い選択肢です。 minidomを使用した同等のコードを次に示します。どちらがいいか自分で判断してください：

import xml.dom.minidom as minidom doc = minidom.parse(filename) memoryElem = doc.getElementsByTagName('memory')[0] print ''.join( [node.data for node in memoryElem.childNodes] ) print memoryElem.getAttribute('unit')

lxmlは私にとって勝者のようです。

d.danailov · Answer

[〜＃〜] xml [〜＃〜]

<data> <items> <item name="item1">item1</item> <item name="item2">item2</item> <item name="item3">item3</item> <item name="item4">item4</item> </items> </data>

Python：

from xml.dom import minidom xmldoc = minidom.parse('items.xml') itemlist = xmldoc.getElementsByTagName('item') print "Len : ", len(itemlist) print "Attribute Name : ", itemlist[0].attributes['name'].value print "Text : ", itemlist[0].firstChild.nodeValue for s in itemlist : print "Attribute Name : ", s.attributes['name'].value print "Text : ", s.firstChild.nodeValue

Ali Afshar · Answer

lxml を使用したetree

root = etree.XML(MY_XML) uuid = root.find('uuid') print uuid.text

Paul J. Warner · Answer

私はlxmlを使用し、xpath //UUIDを使用して解析します

Wai Yip Tung · Answer

他の人がPython標準ライブラリでそれを行う方法を教えてくれます。これを完全に簡単にする独自のミニライブラリをお勧めします。

>>> obj = xml2obj.xml2obj("""<domain type='kmc' id='007'> ... <name>virtual bug</name> ... <uuid>66523dfdf555dfd</uuid> ... <os> ... <type Arch='xintel' machine='ubuntu'>hvm</type> ... <boot dev='hd'/> ... <boot dev='cdrom'/> ... </os> ... <memory unit='KiB'>524288</memory> ... <currentMemory unit='KiB'>270336</currentMemory> ... <vcpu placement='static'>10</vcpu> ... </domain>""") >>> obj.uuid u'66523dfdf555dfd'

http://code.activestate.com/recipes/534109-xml-to-python-data-structure/

Vaibhav Awasthi · Answer

上記のXMLには終了タグがありません。

etree解析エラー：タグのデータの早期終了

正しいXMLは次のとおりです。

<domain type='kmc' id='007'> <name>virtual bug</name> <uuid>66523dfdf555dfd</uuid> <os> <type Arch='xintel' machine='ubuntu'>hvm</type> <boot dev='hd'/> <boot dev='cdrom'/> </os> <memory unit='KiB'>524288</memory> <currentMemory unit='KiB'>270336</currentMemory> <vcpu placement='static'>10</vcpu> </domain>

Edmond Sylar · Answer

（recover = True）を使用して解析を試すことができます。このようなことができます。

parser = etree.XMLParser(recover=True) tree = etree.parse('your xml file', parser)

私は最近これを使用しましたが、うまくいきました。試してみることができますが、さらに複雑なxmlデータ抽出を行う必要がある場合は、いくつかのプロジェクト用に記述したこのコードを確認してください複雑なxmlデータの処理抽出