BeautifulSoupでスクリプトタグを削除できますか？

Question

スクリプトタグとそのすべてのコンテンツは、BeautifulSoupを使用してHTMLから削除できますか、または正規表現などを使用する必要がありますか？

F&#225;bio Diniz · Accepted Answer

>>> from bs4 import BeautifulSoup >>> soup = BeautifulSoup('<script>a</script>baba<script>b</script>', 'lxml') >>> [s.extract() for s in soup('script')] >>> soup baba

Abhishek Dujari · Answer

将来の参照に必要な可能性がある人のための更新された回答：正しい答えは。 decompose()さまざまな方法を使用できますが、decomposeはその場で機能します。

使用例：

soup = BeautifulSoup('<p>This is a slimy text and <i> I am slimer</i></p>') soup.i.decompose() print str(soup) #prints '<p>This is a slimy text and</p>'

「script」、「img」などのような残骸を取り除くのに非常に便利です。

Santiago Alessandri · Answer

（公式ドキュメント）で述べたように、extractメソッドを使用して、検索に一致するすべてのサブツリーを削除できます。

import BeautifulSoup a = BeautifulSoup.BeautifulSoup("<html><body><script>aaa</script></body></html>") [x.extract() for x in a.findAll('script')]