Beautiful SoupですべてのHTMLタグを取得する

Question

美しいスープからすべてのhtmlタグのリストを取得しようとしています。

私はすべてを見つけますが、検索する前にタグの名前を知る必要があります。

次のようなテキストがある場合

html = """<div>something</div> <div>something else</div> <div class='magical'>hi there</div> <p>ok</p>"""

どのようにリストを取得しますか

list_of_tags = ["<div>", "<div>", "<div class='magical'>", "<p>"]

私は正規表現でこれを行う方法を知っていますが、BS4を学ぼうとしています

alecxe · Accepted Answer

find_all()に引数を指定する必要はありません。この場合、BeautifulSoupはツリー内のすべてのタグを再帰的に検索します。サンプル：

>>> from bs4 import BeautifulSoup >>> >>> html = """<div>something</div> ... <div>something else</div> ... <div class='magical'>hi there</div> ... <p>ok</p>""" >>> soup = BeautifulSoup(html, "html.parser") >>> [tag.name for tag in soup.find_all()] [u'div', u'div', u'div', u'p'] >>> [str(tag) for tag in soup.find_all()] ['<div>something</div>', '<div>something else</div>', '<div class="magical">hi there</div>', '<p>ok</p>']