pythonでgoogleから画像をダウンロードできないのはなぜですか？

Question

このコードは、Googleからたくさんの画像をダウンロードするのに役立ちました。それは数日前まで機能していましたが、今では突然コードが壊れます。

コード：

# importing google_images_download module from google_images_download import google_images_download # creating object response = google_images_download.googleimagesdownload() search_queries = ['Apple', 'Orange', 'Grapes', 'water melon'] def downloadimages(query): # keywords is the search query # format is the image file format # limit is the number of images to be downloaded # print urs is to print the image file url # size is the image size which can # be specified manually ("large, medium, icon") # aspect ratio denotes the height width ratio # of images to download. ("tall, square, wide, panoramic") arguments = {"keywords": query, "format": "jpg", "limit":4, "print_urls":True, "size": "medium", "aspect_ratio": "panoramic"} try: response.download(arguments) # Handling File NotFound Error except FileNotFoundError: arguments = {"keywords": query, "format": "jpg", "limit":4, "print_urls":True, "size": "medium"} # Providing arguments for the searched query try: # Downloading the photos based # on the given arguments response.download(arguments) except: pass # Driver Code for query in search_queries: downloadimages(query) print()

出力ログ：

アイテム番号：1->アイテム名= Apple評価中...ダウンロードを開始しています...

残念ながら、ダウンロードできない画像があったため、4つすべてをダウンロードできませんでした。この検索フィルターで取得できるのは0だけです。

エラー：0

アイテム番号：1->アイテム名= Orange評価中...ダウンロードを開始しています...

残念ながら、ダウンロードできない画像があったため、4つすべてをダウンロードできませんでした。この検索フィルターで取得できるのは0だけです。

エラー：0

品目番号：1->品目名=ブドウ評価中...ダウンロードを開始しています...

残念ながら、ダウンロードできない画像があったため、4つすべてをダウンロードできませんでした。この検索フィルターで取得できるのは0だけです。

エラー：0

アイテム番号：1->アイテム名=ウォーターメロン評価中...ダウンロードを開始しています...

残念ながら、ダウンロードできない画像があったため、4つすべてをダウンロードできませんでした。この検索フィルターで取得できるのは0だけです。

エラー：0

これは実際にはフォルダを作成しますが、その中に画像は作成しません。

nguyentran · Answer

GoogleがDOMを変えていると思います。要素class = "rg_meta notranslate"は存在しません。 class = "rg_i ..."に変更されます

 def get_soup(url,header): return BeautifulSoup(urllib2.urlopen(urllib2.Request(url,headers=header)),'html.parser') def main(args): query = "typical face" query = query.split() query = '+'.join(query) url = "https://www.google.co.in/search?q="+query+"&source=lnms&tbm=isch" headers = {} headers['User-Agent'] = "Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/41.0.2228.0 Safari/537.36" soup = get_soup(url, headers) for a in soup.find_all("img", {"class": "rg_i"}): wget.download(a.attrs["data-iurl"], a.attrs["data-iid"]) if __name__ == '__main__': from sys import argv try: main(argv) except KeyboardInterrupt: pass sys.exit()

funnydman · Answer

実際、この問題はそれほど前に現れていませんが、すでに同様のGithubの問題がたくさんあります。

残念ながら、公式のソリューションはありません。今のところ、議論で提供されたtemporaryソリューションを使用できます。

Eamonn Kenny · Answer

これが機能しない理由は、検索文字列に含まれるapi_keyが必要になるように、Googleがすべての方法を変更したためです。この結果、google-images-downloadなどのパッケージは、2.8.0バージョンを使用しても機能しません。1日あたり2,500の無料ダウンロードを取得するには、Googleに登録する必要があるapi_key文字列を挿入するプレースホルダーがないためです。

serpapi.com からサービスにアクセスするために月額50ドル以上を支払う意思がある場合、これを行う1つの方法は、pipパッケージを使用することですgoogle-search-resultsを使用して、api_keyをクエリパラメータの一部として提供します。

params = { "engine" : "google", ... "api_key" : "secret_api_key" }

aPIキーを自分で指定して、次を呼び出します。

client = GoogleSearchResults(params) results = client.get_dict()

これにより、すべての画像のURLへのリンクを含むJSON文字列が返され、直接ダウンロードするだけです。