BeautifulSoupを使用して、CSSセレクターを使用して特定のクラスにある特定のリンクを取得する方法は？

Question

Pythonが初めてで、BeautifulSoupを使用してリンクを収集する目的でスクレイピング目的で学習しています（つまり、 'a'タグのhref）。 "UPCOMINGサイトの「イベント」タブ http://allevents.in/lahore/ 。Firebugを使用して要素を検査し、CSSパスを取得しますが、このコードは何も返しません。修正を探していますまた、適切なCSSセレクターを選択して、任意のサイトから目的のリンクを取得する方法に関するいくつかの提案もあります。

from bs4 import BeautifulSoup import requests url = "http://allevents.in/lahore/" r = requests.get(url) data = r.text soup = BeautifulSoup(data) for link in soup.select( 'html body div.non-overlay.gray-trans-back div.container div.row div.span8 div#eh-1748056798.events-horizontal div.eh-container.row ul.eh-slider li.h-item div.h-meta div.title a[href]'): print link.get('href')

Martijn Pieters · Accepted Answer

このページは、クラスとマークアップを使用する上で最も使いやすいものではありませんが、CSSセレクターはあまりにも具体的であるため、ここでは役立ちません。

今後のイベントが必要な場合は、最初の<div class="events-horizontal">、それから<div class="title"><a href="..."></div>タグ、タイトルのリンク：

upcoming_events_div = soup.select_one('div#events-horizontal') for link in upcoming_events_div.select('div.title a[href]'): print link['href']

notを使用する必要があることに注意してくださいr.text;使用する r.contentそしてUnicodeへのデコードをBeautifulSoupに任せます。 tf-8の文字のエンコードの問題を参照してください

Anuj Saraswat · Answer

import bs4 , requests res = requests.get("http://allevents.in/lahore/") soup = bs4.BeautifulSoup(res.text) for link in soup.select('a[property="schema:url"]'): print link.get('href')

このコードは正常に動作します!!