beautifulsoupを使用して属性値を抽出する

Question

Webページ上の特定の「入力」タグの単一の「値」属性のコンテンツを抽出しようとしています。私は次のコードを使用します：

import urllib f = urllib.urlopen("http://58.68.130.147") s = f.read() f.close() from BeautifulSoup import BeautifulStoneSoup soup = BeautifulStoneSoup(s) inputTag = soup.findAll(attrs={"name" : "stainfo"}) output = inputTag['value'] print str(output)

TypeErrorが発生します：リストインデックスはstrではなく整数でなければなりません

beautifulsoupのドキュメントから、文字列はここでは問題にならないことを理解していますが...私は専門家ではなく、誤解している可能性があります。

どんな提案も大歓迎です！前もって感謝します。

Łukasz · Accepted Answer

.findAll()は、見つかったすべての要素のリストを返します。したがって、

inputTag = soup.findAll(attrs={"name" : "stainfo"})

inputTagはリストです（おそらく1つの要素のみを含む）。正確に何が欲しいかに応じて、次のいずれかを実行する必要があります。

 output = inputTag[0]['value']

または、1つ（最初の）見つかった要素のみを返す.find()メソッドを使用します。

 inputTag = soup.find(attrs={"name": "stainfo"}) output = inputTag['value']

amphibient · Answer

Python 3.xでは、find_allを使用して取得するタグオブジェクトでget(attr_name)を使用するだけです。

xmlData = None with open('conf//test1.xml', 'r') as xmlFile: xmlData = xmlFile.read() xmlDecoded = xmlData xmlSoup = BeautifulSoup(xmlData, 'html.parser') repElemList = xmlSoup.find_all('repeatingelement') for repElem in repElemList: print("Processing repElem...") repElemID = repElem.get('id') repElemName = repElem.get('name') print("Attribute id = %s" % repElemID) print("Attribute name = %s" % repElemName)

次のようなXMLファイルconf//test1.xmlに対して：

<?xml version="1.0" encoding="UTF-8" standalone="yes"?> <root> <singleElement> <subElementX>XYZ</subElementX> </singleElement> <repeatingElement id="11" name="Joe"/> <repeatingElement id="12" name="Mary"/> </root>

プリント：

Processing repElem... Attribute id = 11 Attribute name = Joe Processing repElem... Attribute id = 12 Attribute name = Mary

Margath · Answer

上記のソースから属性の複数の値を取得する場合は、findAllとリスト内包表記を使用して、必要なすべてのものを取得できます。

import urllib f = urllib.urlopen("http://58.68.130.147") s = f.read() f.close() from BeautifulSoup import BeautifulStoneSoup soup = BeautifulStoneSoup(s) inputTags = soup.findAll(attrs={"name" : "stainfo"}) ### You may be able to do findAll("input", attrs={"name" : "stainfo"}) output = [x["stainfo"] for x in inputTags] print output ### This will print a list of the values.

Mr.Bones · Answer

これも使用できます：

import requests from bs4 import BeautifulSoup import csv url = "http://58.68.130.147/" r = requests.get(url) data = r.text soup = BeautifulSoup(data, "html.parser") get_details = soup.find_all("input", attrs={"name":"stainfo"}) for val in get_details: get_val = val["value"] print(get_val)

b1tchacked · Answer

どのようなタグがそれらの属性を持っているか知っていると仮定して、実際にこれを行う時間節約の方法をお勧めします。

タグxyzに「staininfo」という名前のattritubeがあるとします。

full_tag = soup.findAll("xyz")

そして、full_tagがリストであることを理解してほしい

for each_tag in full_tag: staininfo_attrb_value = each_tag["staininfo"] print staininfo_attrb_value

したがって、すべてのタグxyzのstaininfoのすべてのattrb値を取得できます。