BeautifulSoupを使用してタグ内のコンテンツを抽出する

Question

コンテンツHello worldを抽出します。ページには複数の<table>および類似の<td colspan="2">もあることに注意してください。

<table border="0" cellspacing="2" width="800"> <tr> <td colspan="2"><b>Name: </b>Hello world</td> </tr> <tr> ...

私は以下を試しました：

hello = soup.find(text='Name: ') hello.findPreviousSiblings

しかし、それは何も返しませんでした。

さらに、次のMy home addressの抽出にも問題があります。

<td><b>Address:</b></td> <td>My home address</td>

同じ方法を使用してtext="Address: "を検索していますが、次の行に移動して<td>のコンテンツを抽出するにはどうすればよいですか。

solvingPuzzles · Answer

contents演算子は、<tag>text</tag>からtextを抽出するのに適しています。

<td>My home address</td>の例：

s = '<td>My home address</td>' soup = BeautifulSoup(s) td = soup.find('td') #<td>My home address</td> td.contents #My home address

<td><b>Address:</b></td>の例：

s = '<td><b>Address:</b></td>' soup = BeautifulSoup(s) td = soup.find('td').find('b') #<b>Address:</b> td.contents #Address:

Dragan Chupacabric · Answer

代わりに次を使用

>>> s = '<table border="0" cellspacing="2" width="800"><tr><td colspan="2"><b>Name: </b>Hello world</td></tr><tr>' >>> soup = BeautifulSoup(s) >>> hello = soup.find(text='Name: ') >>> hello.next u'Hello world'

nextおよびprevious兄弟メソッドが解析ツリーで機能する一方で、パーサーによって処理された順序でドキュメント要素を移動できます