BeautifulSoupのタグに属性が存在するかどうかをテストします

Question

ドキュメント内のすべての<script>タグを取得し、特定の属性の存在（または不在）に基づいて各タグを処理したいと思います。

たとえば、各<script>タグに対して、属性forが存在する場合、何かを行います。 else属性barが存在する場合は、何か他のことを行います。

現在私がやっていることは次のとおりです。

outputDoc = BeautifulSoup(''.join(output)) scriptTags = outputDoc.findAll('script', attrs = {'for' : True})

しかし、この方法ですべての<script>タグをfor属性でフィルター処理しますが、他のタグ（for属性を持たないもの）は失われました。

Lucas S. · Accepted Answer

私がよく理解している場合は、すべてのスクリプトタグが必要で、その中のいくつかの属性を確認しますか？

scriptTags = outputDoc.findAll('script') for script in scriptTags: if script.has_attr('some_attribute'): do_something()

miah · Answer

将来の参考のために、has_keyは廃止予定です。beautifulsoup4です。has_attrを使用する必要があります。

scriptTags = outputDoc.findAll('script') for script in scriptTags: if script.has_attr('some_attribute'): do_something()

Padraic Cunningham · Answer

属性でフィルタリングするためにラムダは必要ありません。単にsrc=Trueなどを設定する必要があります。

soup = bs4.BeautifulSoup(html) # Find all with a specific attribute tags = soup.find_all(src=True) tags = soup.select("[src]") # Find all meta with either name or http-equiv attribute. soup.select("meta[name],meta[http-equiv]") # find any tags with any name or source attribute. soup.select("[name], [src]") # find first/any script with a src attribute. tag = soup.find('script', src=True) tag = soup.select_one("script[src]") # find all tags with a name attribute beginning with foo # or any src beginning with /path soup.select("[name^=foo], [src^=/path]") # find all tags with a name attribute that contains foo # or any src containing with whatever soup.select("[name*=foo], [src*=whatever]") # find all tags with a name attribute that endwith foo # or any src that ends with whatever soup.select("[name$=foo], [src$=whatever]")

Find/find_allなどでreを使用することもできます。

import re # starting with soup.find_all("script", src=re.compile("^whatever")) # contains soup.find_all("script", src=re.compile("whatever")) # ends with soup.find_all("script", src=re.compile("whatever$"))

SomeGuest · Answer

属性を持つタグのみを取得する必要がある場合は、ラムダを使用できます。

soup = bs4.BeautifulSoup(YOUR_CONTENT)

属性を持つタグ

tags = soup.find_all(lambda tag: 'src' in tag.attrs)

OR

tags = soup.find_all(lambda tag: tag.has_attr('src'))

属性を持つ特定のタグ

tag = soup.find(lambda tag: tag.name == 'script' and 'src' in tag.attrs)

など...

役に立つかもしれないと思った。

Charles Ma · Answer

何らかの属性が存在するかどうかを確認できます

 scriptTags = outputDoc.findAll（ 'script'、some_attribute = True） scriptTagsのスクリプト： do_something（）

Adam Salma · Answer

Pprintモジュールを使用すると、要素の内容を調べることができます。

from pprint import pprint pprint(vars(element))

これをbs4要素で使用すると、次のようなものが出力されます。

{'attrs': {u'class': [u'pie-productname', u'size-3', u'name', u'global-name']}, 'can_be_empty_element': False, 'contents': [u'
				NESNA
	'], 'hidden': False, 'name': u'span', 'namespace': None, 'next_element': u'
				NESNA
	', 'next_sibling': u'
', 'parent': <h1 class="pie-compoundheader" itemprop="name">
<span class="pie-description">Bedside table</span>
<span class="pie-productname size-3 name global-name">
				NESNA
	</span>
</h1>, 'parser_class': <class 'bs4.BeautifulSoup'>, 'prefix': None, 'previous_element': u'
', 'previous_sibling': u'
'}

属性にアクセスするには-クラスリストを発言させます-次を使用します：

class_list = element.attrs.get('class', [])

このアプローチを使用して要素をフィルタリングできます。

for script in soup.find_all('script'): if script.attrs.get('for'): # ... Has 'for' attr Elif "myClass" in script.attrs.get('class', []): # ... Has class "myClass" else: # ... Do something else