URLアドレスを既に知っているPythonを使用して画像をローカルに保存する方法は？

Question

インターネット上の画像のURLを知っています。

例えば http://www.digimouth.com/news/media/2011/09/google-logo.jpg 。Googleのロゴが含まれています。

さて、ブラウザでURLを実際に開いてファイルを手動で保存せずにPythonを使用してこの画像をダウンロードするにはどうすればよいですか。

Liquid_Fire · Accepted Answer

Python 2

ファイルとして保存するだけの場合は、より簡単な方法があります。

import urllib urllib.urlretrieve("http://www.digimouth.com/news/media/2011/09/google-logo.jpg", "local-filename.jpg")

2番目の引数は、ファイルを保存するローカルパスです。

Python 3

SergOが提案したように、以下のコードはPython 3で動作するはずです。

import urllib.request urllib.request.urlretrieve("http://www.digimouth.com/news/media/2011/09/google-logo.jpg", "local-filename.jpg")

Noufal Ibrahim · Answer

import urllib resource = urllib.urlopen("http://www.digimouth.com/news/media/2011/09/google-logo.jpg") output = open("file01.jpg","wb") output.write(resource.read()) output.close()

file01.jpgには画像が含まれます。

Yup. · Answer

これを行うスクリプトを書きました。これは私のgithubで使用できます。

BeautifulSoupを利用して、ウェブサイトで画像を解析できるようにしました。あなたが多くのWebスクレイピングを行う場合（または私のツールを使用する予定がある場合）、Sudo pip install BeautifulSoupをお勧めします。 BeautifulSoupに関する情報が利用可能ですこちら。

便宜上、ここに私のコードがあります：

from bs4 import BeautifulSoup from urllib2 import urlopen import urllib # use this image scraper from the location that #you want to save scraped images to def make_soup(url): html = urlopen(url).read() return BeautifulSoup(html) def get_images(url): soup = make_soup(url) #this makes a list of bs4 element tags images = [img for img in soup.findAll('img')] print (str(len(images)) + "images found.") print 'Downloading images to current working directory.' #compile our unicode list of image links image_links = [each.get('src') for each in images] for each in image_links: filename=each.split('/')[-1] urllib.urlretrieve(each, filename) return image_links #a standard call looks like this #get_images('http://www.wookmark.com')

Martin Thoma · Answer

Python 2およびPython 3で機能するソリューション：

try: from urllib.request import urlretrieve # Python 3 except ImportError: from urllib import urlretrieve # Python 2 url = "http://www.digimouth.com/news/media/2011/09/google-logo.jpg" urlretrieve(url, "local-filename.jpg")

または、 requests の追加要件が許容され、それがhttp（s）URLである場合：

def load_requests(source_url, sink_path): """ Load a file from an URL (e.g. http). Parameters ---------- source_url : str Where to load the file from. sink_path : str Where the loaded file is stored. """ import requests r = requests.get(source_url, stream=True) if r.status_code == 200: with open(sink_path, 'wb') as f: for chunk in r: f.write(chunk)

SergO · Answer

Python 3

rllib.request — URLを開くための拡張可能なライブラリ

from urllib.error import HTTPError from urllib.request import urlretrieve try: urlretrieve(image_url, image_local_path) except FileNotFoundError as err: print(err) # something wrong with local path except HTTPError as err: print(err) # something wrong with url

madprops · Answer

Yup。のスクリプトを拡張したスクリプトを作成しました。いくつか修正しました。 403：Forbidden問題をバイパスします。画像の取得に失敗してもクラッシュしません。破損したプレビューを回避しようとします。正しい絶対URLを取得します。より多くの情報を提供します。コマンドラインからの引数で実行できます。

# getem.py # python2 script to download all images in a given url # use: python getem.py http://url.where.images.are from bs4 import BeautifulSoup import urllib2 import shutil import requests from urlparse import urljoin import sys import time def make_soup(url): req = urllib2.Request(url, headers={'User-Agent' : "Magic Browser"}) html = urllib2.urlopen(req) return BeautifulSoup(html, 'html.parser') def get_images(url): soup = make_soup(url) images = [img for img in soup.findAll('img')] print (str(len(images)) + " images found.") print 'Downloading images to current working directory.' image_links = [each.get('src') for each in images] for each in image_links: try: filename = each.strip().split('/')[-1].strip() src = urljoin(url, each) print 'Getting: ' + filename response = requests.get(src, stream=True) # delay to avoid corrupted previews time.sleep(1) with open(filename, 'wb') as out_file: shutil.copyfileobj(response.raw, out_file) except: print ' An error occured. Continuing.' print 'Done.' if __== '__main__': url = sys.argv[1] get_images(url)

AlexG · Answer

これはリクエストで実行できます。ページをロードし、バイナリコンテンツをファイルにダンプします。

import os import requests url = 'https://apod.nasa.gov/apod/image/1701/potw1636aN159_HST_2048.jpg' page = requests.get(url) f_ext = os.path.splitext(url)[-1] f_name = 'img{}'.format(f_ext) with open(f_name, 'wb') as f: f.write(page.content)

Giovanni Gianni · Answer

Python 3のバージョン

Python 3の@madpropsのコードを調整しました

# getem.py # python2 script to download all images in a given url # use: python getem.py http://url.where.images.are from bs4 import BeautifulSoup import urllib.request import shutil import requests from urllib.parse import urljoin import sys import time def make_soup(url): req = urllib.request.Request(url, headers={'User-Agent' : "Magic Browser"}) html = urllib.request.urlopen(req) return BeautifulSoup(html, 'html.parser') def get_images(url): soup = make_soup(url) images = [img for img in soup.findAll('img')] print (str(len(images)) + " images found.") print('Downloading images to current working directory.') image_links = [each.get('src') for each in images] for each in image_links: try: filename = each.strip().split('/')[-1].strip() src = urljoin(url, each) print('Getting: ' + filename) response = requests.get(src, stream=True) # delay to avoid corrupted previews time.sleep(1) with open(filename, 'wb') as out_file: shutil.copyfileobj(response.raw, out_file) except: print(' An error occured. Continuing.') print('Done.') if __== '__main__': get_images('http://www.wookmark.com')

OO7 · Answer

これは非常に短い答えです。

import urllib urllib.urlretrieve("http://photogallery.sandesh.com/Picture.aspx?AlubumId=422040", "Abc.jpg")