BeautifulSoup：特定のテーブルのコンテンツを取得する

Question

私のローカル空港 IEを使用していないユーザーを不名誉にブロックし、ひどいように見えます。 Pythonスクリプトを書いて、到着ページと出発ページのコンテンツを数分ごとに取得し、読みやすい方法で表示したいと考えています。

私が選んだツールは mechanize IEを使用していると信じてサイトをだますためのもので、 BeautifulSoup がフライトデータテーブルを取得するためのページを解析するためのものです。

正直なところ、BeautifulSoupのドキュメントに夢中になり、ドキュメント全体からテーブル（タイトルがわかっている）を取得する方法と、そのテーブルから行のリストを取得する方法を理解できません。

何か案は？

Ofri Raviv · Accepted Answer

これは必要な特定のコードではなく、BeautifulSoupの操作方法のデモにすぎません。 idが「Table1」のテーブルを見つけ、そのすべてのtr要素を取得します。

html = urllib2.urlopen(url).read() bs = BeautifulSoup(html) table = bs.find(lambda tag: tag.name=='table' and tag.has_attr('id') and tag['id']=="Table1") rows = table.findAll(lambda tag: tag.name=='tr')

goggin13 · Answer

soup = BeautifulSoup(HTML) # the first argument to find tells it what tag to search for # the second you can pass a dict of attr->value pairs to filter # results that match the first tag table = soup.find( "table", {"title":"TheTitle"} ) rows=list() for row in table.findAll("tr"): rows.append(row) # now rows contains each tr in the table (as a BeautifulSoup object) # and you can search them to pull out the times