特定のクラスですべてのdivタグのコンテンツをスクレイピングする

Question

特定のクラスのdivで発生するWebサイトからすべてのテキストをスクレイピングしています。次の例では、クラス「a」のdivにあるすべてのものを抽出したいと思います。

_site <- "<div class='a'>Hello, world</div> <div class='b'>Good morning, world</div> <div class='a'>Good afternoon, world</div>" _

私の望む出力は...

_"Hello, world" "Good afternoon, world" _

以下のコードはすべてのdivからテキストを抽出しますが、class = "a"のみを含める方法がわかりません。

_library(tidyverse) library(rvest) site %>% read_html() %>% html_nodes("div") %>% html_text() # [1] "Hello, world" "Good morning, world" "Good afternoon, world" _

PythonのBeautifulSoupを使用すると、site.find_all("div", class_="a")のようになります。

neilfws · Accepted Answer

div with class = "a"のCSSセレクターはdiv.aです。

site %>% read_html() %>% html_nodes("div.a") %>% html_text()

または、XPathを使用できます。

html_nodes(xpath = "//div[@class='a']")

DJack · Answer

site %>% read_html() %>% html_nodes(xpath = '//*[@class="a"]') %>% html_text()