beautifulsoup：bs4.element.ResultSetオブジェクトまたはリストのfind_all？

Question

こんにちは、find_allをbeautifulsoup objectに適用し、bs4.element.ResultSet objectまたはlistである何かを見つけます。

さらにfind_allを実行したいのですが、bs4.element.ResultSet objectでは許可されていません。 bs4.element.ResultSet objectの各要素をループしてfind_allを実行できます。しかし、ループを避けて、beautifulsoup objectに戻すことはできますか？

詳細についてはコードをご覧ください。ありがとう

html_1 = """ <table> <thead> <tr class="myClass"> <th>A</th> <th>B</th> <th>C</th> <th>D</th> </tr> </thead> </table> """ soup = BeautifulSoup(html_1, 'html.parser') type(soup) #bs4.BeautifulSoup # do find_all on beautifulsoup object th_all = soup.find_all('th') # the result is of type bs4.element.ResultSet or similarly list type(th_all) #bs4.element.ResultSet type(th_all[0:1]) #list # now I want to further do find_all th_all.find_all(text='A') #not work # can I avoid this need of loop? for th in th_all: th.find_all(text='A') #works

alecxe · Accepted Answer

ResultSetクラスは、リストのサブクラスであり、 Tagクラスではありません。 _find*_メソッドが定義されています。 find_all()の結果をループすることが最も一般的なアプローチです。

_th_all = soup.find_all('th') result = [] for th in th_all: result.extend(th.find_all(text='A')) _

通常、 CSSセレクターは、find_all()でできることのすべてがselect()メソッドで可能であるわけではないことを除いて、一度に解決するのに役立ちます。たとえば、_bs4_ CSSセレクターで使用できる「テキスト」検索はありません。ただし、たとえば、b要素内のth要素をすべて検索する必要がある場合は、次のようにします。

_soup.select("th td") _