「スマート」な方法でpythonを使用してファイルをダウンロードするには？

Question

Pythonでhttp経由でいくつかのファイルをダウンロードする必要があります。

最も明白な方法は、urllib2を使用することです。

import urllib2 u = urllib2.urlopen('http://server.com/file.html') localFile = open('file.html', 'w') localFile.write(u.read()) localFile.close()

しかし、何らかの形で厄介なURLを処理する必要があります。次のように言います：http://server.com/!Run.aspx/someoddtext/somemore?id=121&m=pdf。ブラウザ経由でダウンロードすると、ファイルは人間が読める名前になります。 accounts.pdf。

Pythonでそれを処理する方法はありますか？そのため、ファイル名を知ってスクリプトにハードコードする必要はありませんか？

Oli · Accepted Answer

このようなスクリプトをダウンロードすると、ユーザーエージェントにファイルの名前を伝えるヘッダーをプッシュする傾向があります。

Content-Disposition: attachment; filename="the filename.ext"

そのヘッダーを取得できる場合は、適切なファイル名を取得できます。

別のスレッドがあり、Content-Disposition-つかむ。

remotefile = urllib2.urlopen('http://example.com/somefile.Zip') remotefile.info()['Content-Disposition']

kender · Answer

コメントと@Oliのアンサーに基づいて、私はこのような解決策を作りました：

from os.path import basename from urlparse import urlsplit def url2name(url): return basename(urlsplit(url)[2]) def download(url, localFileName = None): localName = url2name(url) req = urllib2.Request(url) r = urllib2.urlopen(req) if r.info().has_key('Content-Disposition'): # If the response has Content-Disposition, we take file name from it localName = r.info()['Content-Disposition'].split('filename=')[1] if localName[0] == '"' or localName[0] == "'": localName = localName[1:-1] Elif r.url != url: # if we were redirected, the real file name we take from the final URL localName = url2name(r.url) if localFileName: # we can force to save the file as specified name localName = localFileName f = open(localName, 'wb') f.write(r.read()) f.close()

Content-Dispositionからファイル名を取得します。存在しない場合は、URLのファイル名を使用します（リダイレクトが発生した場合、最終的なURLが考慮されます）。

lostlogic · Answer

上記の多くを組み合わせて、よりPython的なソリューションを以下に示します。

import urllib2 import shutil import urlparse import os def download(url, fileName=None): def getFileName(url,openUrl): if 'Content-Disposition' in openUrl.info(): # If the response has Content-Disposition, try to get filename from it cd = dict(map( lambda x: x.strip().split('=') if '=' in x else (x.strip(),''), openUrl.info()['Content-Disposition'].split(';'))) if 'filename' in cd: filename = cd['filename'].strip("\"'") if filename: return filename # if no filename was found above, parse it out of the final URL. return os.path.basename(urlparse.urlsplit(openUrl.url)[2]) r = urllib2.urlopen(urllib2.Request(url)) try: fileName = fileName or getFileName(url,r) with open(fileName, 'wb') as f: shutil.copyfileobj(r,f) finally: r.close()

Denis Barmenkov · Answer

2 Kender：

if localName[0] == '"' or localName[0] == "'": localName = localName[1:-1]

安全ではありません-Webサーバーは間違った形式の名前を["file.ext]または[file.ext ']として渡すことができます。また、空であってもlocalName [0]は例外を発生させます。正しいコードは次のようになります。

localName = localName.replace('"', '').replace("'", "") if localName == '': localName = SOME_DEFAULT_FILE_NAME

Jaydev · Answer

wgetを使用：

custom_file_name = "/custom/path/custom_name.ext" wget.download(url, custom_file_name)

Urlretrieveの使用：

urllib.urlretrieve(url, custom_file_name)

urlretrieveは、存在しない場合はディレクトリ構造も作成します。