urllib2 tryと404を除く

Question

Urlib2を使用して、一連の番号付きデータページを調べようとしています。私がやりたいことは、tryステートメントを使用することですが、私はそれについてほとんど知識がありません。少し読んで判断すると、例外である特定の「名前」に基づいているようです（例：IOErrorなど）。私が探しているエラーコードは問題の一部です。

私は 'urllib2 the missing manual'から私のurllib2ページフェッチルーチンを作成/貼り付けました：

def fetch_page(url,useragent) urlopen = urllib2.urlopen Request = urllib2.Request cj = cookielib.LWPCookieJar() txheaders = {'User-agent' : useragent} if os.path.isfile(COOKIEFILE): cj.load(COOKIEFILE) print "previous cookie loaded..." else: print "no ospath to cookfile" opener = urllib2.build_opener(urllib2.HTTPCookieProcessor(cj)) urllib2.install_opener(opener) try: req = urllib2.Request(url, useragent) # create a request object handle = urlopen(req) # and open it to return a handle on the url except IOError, e: print 'Failed to open "%s".' % url if hasattr(e, 'code'): print 'We failed with error code - %s.' % e.code Elif hasattr(e, 'reason'): print "The error object has the following 'reason' attribute :" print e.reason print "This usually means the server doesn't exist,", print "is down, or we don't have an internet connection." return False else: print if cj is None: print "We don't have a cookie library available - sorry." print "I can't show you any cookies." else: print 'These are the cookies we have received so far :' for index, cookie in enumerate(cj): print index, ' : ', cookie cj.save(COOKIEFILE) # save the cookies again page = handle.read() return (page) def fetch_series(): useragent="Firefox...etc." url="www.example.com/01.html" try: fetch_page(url,useragent) except [something]: print "failed to get page" sys.exit()

下の関数は、私が何を言っているのかを確認するための例にすぎません。そこに何を入れるべきかを誰かに教えてもらえますか？ページ取得関数が404を取得した場合にFalseを返すようにしましたが、これは正しいですか？では、なぜFalse以外は機能しないのですか？あなたが与えることができるあらゆる助けをありがとう。

ここでアドバイスに従ってうまくいきました：

except urlib2.URLError, e: except URLError, e: except URLError: except urllib2.IOError, e: except IOError, e: except IOError: except urllib2.HTTPError, e: except urllib2.HTTPError: except HTTPError:

それらのどれも動作しません。

Acorn · Accepted Answer

すばらしい requests モジュールをチェックすることをお勧めします。

これにより、求めている機能を次のように実現できます。

import requests from requests.exceptions import HTTPError try: r = requests.get('http://httpbin.org/status/200') r.raise_for_status() except HTTPError: print 'Could not download page' else: print r.url, 'downloaded successfully' try: r = requests.get('http://httpbin.org/status/404') r.raise_for_status() except HTTPError: print 'Could not download', r.url else: print r.url, 'downloaded successfully'

chown · Answer

404を検出する場合は、_urllib2.HTTPError_をキャッチする必要があります。

_try: req = urllib2.Request(url, useragent) # create a request object handle = urllib2.urlopen(req) # and open it to return a handle on the url except urllib2.HTTPError, e: print 'We failed with error code - %s.' % e.code if e.code == 404: # do stuff.. else: # other stuff... return False else: # ... _

Fetch_series（）でキャッチするには：

_def fetch_page(url,useragent) urlopen = urllib2.urlopen Request = urllib2.Request cj = cookielib.LWPCookieJar() try: urlopen() #... except IOError, e: # ... else: #... def fetch_series(): useragent=”Firefox...etc.” url=”www.example.com/01.html try: fetch_page(url,useragent) except urllib2.HTTPError, e: print “failed to get page” _

http://docs.python.org/library/urllib2.html ：

_exception urllib2.HTTPError_
例外（URLErrorのサブクラス）ですが、HTTPErrorは例外ではないファイルのような戻り値として機能することもできます（urlopen() 戻り値）。これは、認証要求などのエキゾチックなHTTPエラーを処理するときに役立ちます。

code
RFC 2616で定義されているHTTPステータスコード。この数値は、_BaseHTTPServer.BaseHTTPRequestHandler.responses_にあるコードの辞書にある値に対応しています。

kxr · Answer

インタラクティブポーキング：

python）でそのような例外の性質と可能なコンテンツについて調べるには、キーの呼び出しをインタラクティブに試すのが最善です。

>>> f = urllib2.urlopen('http://httpbin.org/status/404') Traceback (most recent call last): ... File "C:\Python27\lib\urllib2.py", line 558, in http_error_default raise HTTPError(req.get_full_url(), code, msg, hdrs, fp) HTTPError: HTTP Error 404: NOT FOUND

次に、sys.last_valueには、インタラクティブに分類された例外値が含まれています。
（TAB +を使用します。インタラクティブシェルの自動展開、dir（）、vars（）...）

>>> ev = sys.last_value >>> ev.__class__ <class 'urllib2.HTTPError'> >>> dir(ev) ['_HTTPError__super_init', '__class__', '__delattr__', '__dict__', '__doc__', '__format__', '__getattribute__', '__getitem__', '__getslice__', '__hash__', '__init__', '__iter__', '__module__', '__new__', '__reduce__', '__reduce_ex__', '__repr__', '__setattr__', '__setstate__', '__sizeof__', '__str__', '__subclasshook__', '__unicode__', '__weakref__', 'args', 'close', 'code', 'errno', 'filename', 'fileno', 'fp', 'getcode', 'geturl', 'hdrs', 'headers', 'info', 'message', 'msg', 'next', 'read', 'readline', 'readlines', 'reason', 'strerror', 'url'] >>> vars(ev) {'fp': <addinfourl at 140193880 whose fp = <socket._fileobject object at 0x01062370>>, 'fileno': <bound method _fileobject.fileno of <socket._fileobject object at 0x01062370>>, 'code': 404, 'hdrs': <httplib.HTTPMessage instance at 0x085ADF80>, 'read': <bound method _fileobject.read of <socket._fileobject object at 0x01062370>>, 'readlines': <bound method _fileobject.readlines of <socket._fileobject object at 0x01062370>>, 'next': <bound method _fileobject.next of <socket._fileobject object at 0x01062370>>, 'headers': <httplib.HTTPMessage instance at 0x085ADF80>, '__iter__': <bound method _fileobject.__iter__ of <socket._fileobject object at 0x01062370>>, 'url': 'http://httpbin.org/status/404', 'msg': 'NOT FOUND', 'readline': <bound method _fileobject.readline of <socket._fileobject object at 0x01062370>>} >>> sys.last_value.code 404

処理してみてください：

>>> try: f = urllib2.urlopen('http://httpbin.org/status/404') ... except urllib2.HTTPError, ev: ... print ev, "'s error code is", ev.code ... HTTP Error 404: NOT FOUND 's error code is 404

HTTPエラーをスローしないシンプルなオープナーを作成する：

>>> ho = urllib2.OpenerDirector() >>> ho.add_handler(urllib2.HTTPHandler()) >>> f = ho.open('http://localhost:8080/cgi/somescript.py'); f <addinfourl at 138851272 whose fp = <socket._fileobject object at 0x01062370>> >>> f.code 500 >>> f.read() 'Execution error: <pre style="background-color:#faa">
NameError: name \'e\' is not defined
<pre>
'

urllib2.build_openerのデフォルトハンドラ：

default_classes = [ProxyHandler、UnknownHandler、HTTPHandler、HTTPDefaultErrorHandler、HTTPRedirectHandler、FTPHandler、FileHandler、HTTPErrorProcessor]