urllib2ファイル名

Question

次のようにurllib2を使用してファイルを開く場合：

remotefile = urllib2.urlopen('http://example.com/somefile.Zip')

元のURLを解析する以外にファイル名を取得する簡単な方法はありますか？

編集：openfileをurlopenに変更しました...それがどのように起こったかはわかりません。

EDIT2：私は最終的に使用しました：

filename = url.split('/')[-1].split('#')[0].split('?')[0]

私が間違えない限り、これによりすべての潜在的なクエリも取り除かれるはずです。

Jonny Buchanan · Accepted Answer

もしかして rllib2.urlopen ？

サーバーがコンテンツを送信していた場合意図したファイル名ifを持ち上げる可能性があります-remotefile.info()['Content-Disposition']をチェックしてヘッダーを破棄しますが、現状ではURLを解析する必要があるだけだと思います。

urlparse.urlsplit、ただし2番目の例のようなURLがある場合は、とにかくファイル名を自分で引き出す必要があります。

>>> urlparse.urlsplit('http://example.com/somefile.Zip') ('http', 'example.com', '/somefile.Zip', '', '') >>> urlparse.urlsplit('http://example.com/somedir/somefile.Zip') ('http', 'example.com', '/somedir/somefile.Zip', '', '')

同様にこれを行うかもしれません：

>>> 'http://example.com/somefile.Zip'.split('/')[-1] 'somefile.Zip' >>> 'http://example.com/somedir/somefile.Zip'.split('/')[-1] 'somefile.Zip'

Jay · Answer

http://example.com/somedir/somefile.zip?foo=bar のように最後にクエリ変数がないと仮定して、ファイル名だけが必要な場合は、os.pathを使用できますこれの.basename：

[user@Host]$ python Python 2.5.1 (r251:54869, Apr 18 2007, 22:08:04) Type "help", "copyright", "credits" or "license" for more information. >>> import os >>> os.path.basename("http://example.com/somefile.Zip") 'somefile.Zip' >>> os.path.basename("http://example.com/somedir/somefile.Zip") 'somefile.Zip' >>> os.path.basename("http://example.com/somedir/somefile.zip?foo=bar") 'somefile.zip?foo=bar'

他のいくつかのポスターはurlparseを使用して言及しましたが、機能しますが、それでもファイル名から先頭のディレクトリを取り除く必要があります。 os.path.basename（）を使用する場合は、URLまたはファイルパスの最後の部分のみを返すため、心配する必要はありません。

Rafał Dowgird · Answer

「ファイル名」は、http転送に関しては、明確に定義された概念ではないと思います。サーバーは、「content-disposition」ヘッダーとしてサーバーを提供する場合があります（必須ではありません）。remotefile.headers['Content-Disposition']。これが失敗した場合は、おそらく自分でURIを解析する必要があります。

TMF Wolfman · Answer

普通にこれを見ただけです。

filename = url.split("?")[0].split("/")[-1]

Filipe Correia · Answer

urlsplitを使用するのが最も安全なオプションです。

url = 'http://example.com/somefile.Zip' urlparse.urlsplit(url).path.split('/')[-1]

Dan Lenski · Answer

urllib2.urlopenですか？ urllib2モジュールにはopenfileという関数はありません。

とにかく、urllib2.urlparse関数を使用します。

>>> from urllib2 import urlparse >>> print urlparse.urlsplit('http://example.com/somefile.Zip') ('http', 'example.com', '/somefile.Zip', '', '')

出来上がり。

R&#233;gis B. · Answer

os.path.basename関数は、ファイルパスだけでなく、URLでも機能するため、手動でURLを解析する必要はありません。また、リダイレクト応答を追跡するには、元のURLではなくresult.urlを使用する必要があることに注意してください。

import os import urllib2 result = urllib2.urlopen(url) real_url = urllib2.urlparse.urlparse(result.url) filename = os.path.basename(real_url.path)

Yth · Answer

また、2つの最高評価の回答の両方を組み合わせることができます。urllib2.urlparse.urlsplit（）を使用してURLのパス部分を取得し、os.path.basenameで実際のファイル名を取得します。

完全なコードは次のようになります：

>>> remotefile=urllib2.urlopen(url) >>> try: >>> filename=remotefile.info()['Content-Disposition'] >>> except KeyError: >>> filename=os.path.basename(urllib2.urlparse.urlsplit(url).path)

miracle2k · Answer

解析によって何を意味するかによって異なります。 URLを解析せずにファイル名を取得する方法はありません。つまり、リモートサーバーはファイル名を提供しません。ただし、自分で多くを行う必要はありません。urlparseモジュールがあります。

In [9]: urlparse.urlparse('http://example.com/somefile.Zip') Out[9]: ('http', 'example.com', '/somefile.Zip', '', '', '')

Corey Goldberg · Answer

私が知っていることではありません。

しかし、次のように簡単に解析できます。

 url = 'http://example.com/somefile.Zip' print url.split（ '/'）[-1] </ code>

Adam Nelson · Answer

PurePosixPath を使用すると、オペレーティングシステムに依存せず、依存性があり、URLを適切に処理できます。これはPythonのソリューションです。

>>> from pathlib import PurePosixPath >>> path = PurePosixPath('http://example.com/somefile.Zip') >>> path.name 'somefile.Zip' >>> path = PurePosixPath('http://example.com/nested/somefile.Zip') >>> path.name 'somefile.Zip'

ここにネットワークトラフィックがないことに注意してください（つまり、これらのURLはどこにも行きません）。標準の解析ルールを使用するだけです。

user15453 · Answer

import os,urllib2 resp = urllib2.urlopen('http://www.example.com/index.html') my_url = resp.geturl() os.path.split(my_url)[1] # 'index.html'

これはopenfileではありませんが、おそらくそれでも役立ちます:)

Vovan Kuznetsov · Answer

ここでは、おそらく単純な正規表現を使用できます。何かのようなもの：

In [26]: import re In [27]: pat = re.compile('.+[\/\?#=]([\w-]+\.[\w-]+(?:\.[\w-]+)?$)') In [28]: test_set ['http://www.google.com/a341.tar.gz', 'http://www.google.com/a341.gz', 'http://www.google.com/asdasd/aadssd.gz', 'http://www.google.com/asdasd?aadssd.gz', 'http://www.google.com/asdasd#blah.gz', 'http://www.google.com/asdasd?filename=xxxbl.gz'] In [30]: for url in test_set: ....: match = pat.match(url) ....: if match and match.groups(): ....: print(match.groups()[0]) ....: a341.tar.gz a341.gz aadssd.gz aadssd.gz blah.gz xxxbl.gz

DoomedRaven · Answer

リクエストを使用しますが、urllib（2）で簡単に実行できます

import requests from urllib import unquote from urlparse import urlparse sample = requests.get(url) if sample.status_code == 200: #has_key not work here, and this help avoid problem with names if filename == False: if 'content-disposition' in sample.headers.keys(): filename = sample.headers['content-disposition'].split('filename=')[-1].replace('"','').replace(';','') else: filename = urlparse(sample.url).query.split('/')[-1].split('=')[-1].split('&')[-1] if not filename: if url.split('/')[-1] != '': filename = sample.url.split('/')[-1].split('=')[-1].split('&')[-1] filename = unquote(filename)