XPathを使用して2番目の要素のテキストを取得しますか？

Question

<span class='python'> <a>google</a> <a>chrome</a> </span>

chromeを取得して、すでにこのように動作させたいです。

q = item.findall('.//span[@class="python"]//a') t = q[1].text # first element = 0

それを単一のXPath式に組み合わせて、リストではなく1つの項目を取得したいのですが。
私はこれを試しましたが、うまくいきません。

t = item.findtext('.//span[@class="python"]//a[2]') # first element = 1

そして、単純化されていない実際のHTMLは次のようになります。

<span class='python'> <span> <span> <img></img> <a>google</a> </span> <a>chrome</a> </span> </span>

Dimitre Novatchev · Accepted Answer

私はこれを試しましたが、うまくいきません。
t = item.findtext('.//span[@class="python"]//a[2]') 

これはFAQ //省略形についてです。

.//a[2]の意味：親の2番目のa子である現在のノードのすべてのa子孫を選択します。そのため、具体的なXMLドキュメントによっては、複数の要素が選択される場合と、要素がまったく選択されない場合があります。

簡単に言えば、[]演算子は//よりも優先されます。

すべてのノードの1つ（2番目）だけを返す場合は、角かっこを使用して優先順位を強制する必要があります。

(.//a)[2]

これにより、現在のノードの2番目のa子孫が実際に選択されます。

質問で使用されている実際の表現については、次のように変更します：

(.//span[@class="python"]//a)[2]

または、次のように変更します。

(.//span[@class="python"]//a)[2]/text()

MattH · Answer

何が問題なのかわかりません...

>>> d = """<span class='python'> ... <a>google</a> ... <a>chrome</a> ... </span>""" >>> from lxml import etree >>> d = etree.HTML(d) >>> d.xpath('.//span[@class="python"]/a[2]/text()') ['chrome'] >>>

user357812 · Answer

コメントから：

または、私が投稿した実際のHTMLの単純化が単純すぎる

あなたが正しいです。 _.//span[@class="python"]//a[2]_の意味は何ですか？これは次のように拡張されます。

_self::node() /descendant-or-self::node() /child::span[attribute::class="python"] /descendant-or-self::node() /child::a[position()=2] _

最後に2番目のa子を選択します（fn:position()はchild軸を指します）。したがって、ドキュメントが次のような場合、何も選択されません。

_<span class='python'> <span> <span> <img></img> <a>google</a><!-- This is the first "a" child of its parent --> </span> <a>chrome</a><!-- This is also the first "a" child of its parent --> </span> </span> _

すべての子孫の2番目が必要な場合は、以下を使用します。

_descendant::span[@class="python"]/descendant::a[2] _