Python：ElementTreeで解析されたXMLファイル内のネストされた子にアクセスする

Question

私はxml解析の初心者です。このxmlファイルには次のツリーがあります。

FHRSEstablishment |--> Header | |--> ... |--> EstablishmentCollection | |--> EstablishmentDetail | | |-->... | |--> Scores | | |-->... |--> EstablishmentCollection | |--> EstablishmentDetail | | |-->... | |--> Scores | | |-->...

しかし、ElementTreeでアクセスし、childタグと属性を探すと、

import xml.etree.ElementTree as ET import urllib2 tree = ET.parse( file=urllib2.urlopen('http://ratings.food.gov.uk/OpenDataFiles/FHRS408en-GB.xml' % i)) root = tree.getroot() for child in root: print child.tag, child.attrib

私が得るのは：

Header {} EstablishmentCollection {}

私はそれらの属性が空であることを意味すると思います。なぜそうなのですか？EstablishmentDetailおよびScores内にネストされた子にアクセスするにはどうすればよいですか？

[〜＃〜] edit [〜＃〜]

以下の回答のおかげで、私はツリー内に入れることができますが、Scoresのような値を取得したい場合、これは失敗します：

for node in root.find('.//EstablishmentDetail/Scores'): rating = node.attrib.get('Hygiene') print rating

そして生産する

None None None

何故ですか？

Keerthana Prabhakaran · Accepted Answer

あなたのルート上でiter（）する必要があります。

それはroot.iter()がトリックをするということです！

_import xml.etree.ElementTree as ET import urllib2 tree =ET.parse(urllib2.urlopen('http://ratings.food.gov.uk/OpenDataFiles/FHRS408en-GB.xml')) root = tree.getroot() for child in root.iter(): print child.tag, child.attrib _

出力：

_FHRSEstablishment {} Header {} ExtractDate {} ItemCount {} ReturnCode {} EstablishmentCollection {} EstablishmentDetail {} FHRSID {} LocalAuthorityBusinessID {} ... _

EstablishmentDetail内のすべてのタグを取得するには、そのタグを見つけて、その子をループする必要があります！

あれは、

_for child in root.find('.//EstablishmentDetail'): print child.tag, child.attrib _

出力：

_FHRSID {} LocalAuthorityBusinessID {} BusinessName {} BusinessType {} BusinessTypeID {} RatingValue {} RatingKey {} RatingDate {} LocalAuthorityCode {} LocalAuthorityName {} LocalAuthorityWebSite {} LocalAuthorityEmailAddress {} Scores {} SchemeType {} NewRatingPending {} Geocode {} _

コメントで言及したようにHygieneのスコアを取得するには、

あなたがしたことは、最初のScoresタグを取得し、for each in root.find('.//Scores'):rating=child.get('Hygiene')を呼び出すと、子としてHygiene、ConfidenceInManagement、Structuralタグを持つことになります。つまり、明らかに3人の子すべてに要素はありません！

最初に、すべてのScoresタグを見つける必要があります。 -見つかったすべてのタグでHygieneを見つけてください！

_for each in root.findall('.//Scores'):rating = each.find('.//Hygiene'); print '' if rating is None else rating.text; _

出力：

_5 5 5 0 5 _

Andrea · Answer

それが役に立つことを願っています：

import xml.etree.ElementTree as etree with open('filename.xml') as tmpfile: doc = etree.iterparse(tmpfile, events=("start", "end")) doc = iter(doc) event, root = doc.next() num = 0 for event, elem in doc: print event, elem