https://www.thenewboston.com/をスクレイピングする際の「SSL：certificate_verify_failed」エラー

Question

だから私はPython最近YouTubeの "The New Boston's"ビデオを使って学習を始めました。単純なWebクローラーを作成する彼のチュートリアルに着くまで、すべてがうまくいきました。問題なく理解できましたが、コードを実行すると、「SSL：CERTIFICATE_VERIFY_FAILED」に基づいてエラーが発生しているように見えます。昨夜から修正方法を見つけようとして答えを探していましたが、ビデオのコメントには誰もいないようです。彼のウェブサイトで私と同じ問題を抱えており、彼のウェブサイトから他の人のコードを使用しても同じ結果が得られます。同じエラーと私が同じエラーを与えるため、ウェブサイトから取得したコードを投稿しますコーディングは今混乱です。

_import requests from bs4 import BeautifulSoup def trade_spider(max_pages): page = 1 while page <= max_pages: url = "https://www.thenewboston.com/forum/category.php?id=15&orderby=recent&page=" + str(page) #this is page of popular posts source_code = requests.get(url) # just get the code, no headers or anything plain_text = source_code.text # BeautifulSoup objects can be sorted through easy for link in soup.findAll('a', {'class': 'index_singleListingTitles'}): #all links, which contains "" class='index_singleListingTitles' "" in it. href = "https://www.thenewboston.com/" + link.get('href') title = link.string # just the text, not the HTML print(href) print(title) # get_single_item_data(href) page += 1 trade_spider(1) _

完全なエラー：ssl.SSLError: [SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed (_ssl.c:645)

これが愚かな質問である場合、私はまだプログラミングに慣れていないことをおmびしますが、私は真剣にこれを理解することはできません。

Steffen Ullrich · Accepted Answer

問題はコードではなく、アクセスしようとしているWebサイトにあります。 SSLLabsによる分析を見ると、次のことに気付くでしょう。

このサーバーの証明書チェーンは不完全です。 Bを上限とするグレード.

これは、サーバーの構成が間違っていることと、pythonだけでなく、他のいくつかのサイトがこのサイトで問題を抱えていることを意味します。または、キャッシュされた証明書を入力しますが、Pythonと同様に、他のブラウザーやアプリケーションも失敗します。

破損したサーバー構成を回避するには、欠落している証明書を明示的に抽出し、信頼ストアに追加します。または、verify引数内で証明書を信頼として与えることができます。ドキュメントから：

信頼できるCAの証明書を使用して、CA_BUNDLEファイルまたはディレクトリへのパスの検証を渡すことができます。
>>> requests.get('https://github.com', verify='/path/to/certfile') 
この信頼できるCAのリストは、REQUESTS_CA_BUNDLE環境変数を使用して指定することもできます。

mattexx · Answer

SSL証明書を検証しないよう要求することができます。

>>> url = "https://www.thenewboston.com/forum/category.php?id=15&orderby=recent&page=1" >>> response = requests.get(url, verify=False) >>> response.status_code 200

詳しくは requests doc をご覧ください

markhor · Answer

システムに在庫証明書がない可能性があります。例えば。 Ubuntuで実行している場合は、ca-certificatesパッケージがインストールされます。

amitnair92 · Answer

Python dmgインストーラーを使用する場合は、Python 3のReadMeを読み取り、bashコマンドを実行して新しい証明書を取得する必要があります。

実行してみてください

/Applications/Python\ 3.6/Install\ Certificates.command

NuclearPeon · Answer

私はこれまでにあなたの問題を乗り越えてきたので、これを答えとして投稿していますが、あなたのコードにはまだ問題があります（修正したら更新できます）。

要するに、古いバージョンのリクエストを使用しているか、SSL証明書が無効である可能性があります。これに詳細がありますSO質問： Pythonは "certifyate verify failed"を要求します

コードを自分のbsoup.pyファイルに更新しました：

#!/usr/bin/env python3 import requests from bs4 import BeautifulSoup def trade_spider(max_pages): page = 1 while page <= max_pages: url = "https://www.thenewboston.com/forum/category.php?id=15&orderby=recent&page=" + str(page) #this is page of popular posts source_code = requests.get(url, timeout=5, verify=False) # just get the code, no headers or anything plain_text = source_code.text # BeautifulSoup objects can be sorted through easy for link in BeautifulSoup.findAll('a', {'class': 'index_singleListingTitles'}): #all links, which contains "" class='index_singleListingTitles' "" in it. href = "https://www.thenewboston.com/" + link.get('href') title = link.string # just the text, not the HTML print(href) print(title) # get_single_item_data(href) page += 1 if __name__ == "__main__": trade_spider(1)

スクリプトを実行すると、次のエラーが表示されます。

https://www.thenewboston.com/forum/category.php?id=15&orderby=recent&page=1 Traceback (most recent call last): File "./bsoup.py", line 26, in <module> trade_spider(1) File "./bsoup.py", line 16, in trade_spider for link in BeautifulSoup.findAll('a', {'class': 'index_singleListingTitles'}): #all links, which contains "" class='index_singleListingTitles' "" in it. File "/usr/local/lib/python3.4/dist-packages/bs4/element.py", line 1256, in find_all generator = self.descendants AttributeError: 'str' object has no attribute 'descendants'

findAllメソッドに問題があります。 python3とpython2の両方を使用しましたが、python2はこれを報告します。

TypeError: unbound method find_all() must be called with BeautifulSoup instance as first argument (got str instance instead)

そのため、続行するにはそのメソッドを修正する必要があるようです