Nokogiriのタグ内に直接テキストを取得する

Question

次のようなHTMLがあります。

_<dt> <a href="#">Hello</a> (2009) </dt> _

すでにすべてのHTMLをrecordという変数にロードしています。年、つまり2009が存在する場合は、それを解析する必要があります。

dtタグ内のテキストを取得できますが、aタグ内のテキストは取得できませんか？私はrecord.search("dt").inner_textを使用しましたが、これですべてが得られます。

ささいな質問ですが、私はこれを理解することができませんでした。

Casper · Accepted Answer

すべての直接の子をテキストで取得し、それ以上のサブ子を取得しないようにするには、次のようにXPathを使用できます。

doc.xpath('//dt/text()')

または、検索を使用する場合：

doc.search('dt').xpath('text()')

Phrogz · Answer

XPathを使用して（@Casperによって提案されているように）必要なものを正確に選択することが正しい答えです。

def own_text(node) # Find the content of all child text nodes and join them together node.xpath('text()').text end

これが別の楽しい答えです:)

def own_text(node) node.clone(1).tap{ |copy| copy.element_children.remove }.text end

実際に見られる：

require 'nokogiri' root = Nokogiri.XML('<r>hi <a>BOO</a> there</r>').root puts root.text #=> hi BOO there puts own_text(root) #=> hi there

Chamnap · Answer

dt要素には2つの子があるため、次の方法でアクセスできます。

doc.search("dt").children.last.text