Beautiful Soupですべてのコメントを見つける方法

Question

この質問は4年前に尋ねられましたが、その答えはBS4に対しては古くなっています。

美しいスープを使用して、htmlファイル内のすべてのコメントを削除したい。 BS4はそれぞれの特別なタイプのナビゲート可能な文字列としてのコメントを作成するので、このコードが機能すると思いました：

for comments in soup.find_all('comment'): comments.decompose()

それでうまくいきませんでした...どうすればBS4を使用してすべてのコメントを見つけることができますか？

Flickerlight · Accepted Answer

関数をfind_all（）に渡して、文字列がコメントかどうかを確認できます。

たとえば、私は以下のhtmlを持っています：

_<body> <!-- Branding and main navigation --> <div class="Branding">The Science &amp; Safety Behind Your Favorite Products</div> <div class="l-branding"> <p>Just a brand</p> </div> <!-- test comment here --> <div class="block_content"> <a href="https://www.google.com">Google</a> </div> </body> _

コード：

_from bs4 import BeautifulSoup as BS from bs4 import Comment .... soup = BS(html, 'html.parser') comments = soup.find_all(string=lambda text: isinstance(text, Comment)) for c in comments: print(c) print("===========") c.extract() _

出力は次のようになります。

_Branding and main navigation ============ test comment here ============ _

ところで、find_all('Comment')が機能しない理由は（BeautifulSoupドキュメントから）だと思います。

名前の値を渡すと、Beautiful Soupに特定の名前のタグのみを考慮するように指示します。 テキスト文字列は無視されます、名前が一致しないタグも同様です。

Joseph · Answer

私がする必要があった2つのこと：

まず、美しいスープをインポートするとき

from bs4 import BeautifulSoup, Comment

次に、コメントを抽出するコードです

for comments in soup.findAll(text=lambda text:isinstance(text, Comment)): comments.extract()