pythonリクエストのタイムアウト。レスポンス全体を取得する。

Question

私はウェブサイトのリストに関する統計を集めています、そして、私は単純化するためにそれのために要求を使っています。これが私のコードです：

data=[] websites=['http://google.com', 'http://bbc.co.uk'] for w in websites: r= requests.get(w, verify=False) data.append( (r.url, len(r.content), r.elapsed.total_seconds(), str([(l.status_code, l.url) for l in r.history]), str(r.headers.items()), str(r.cookies.items())) )

さて、私はrequests.getが10秒後にタイムアウトするようにしてループが動かなくならないようにします。

この質問は興味をそそられています以前のでも答えはどれもきれいではありません。私はいい答えを得るためにこれにいくらかの恩恵をかけるつもりです。

おそらくリクエストを使用しないのが良い考えであると私は聞いていますが、それではどのように私はリクエストが提供する素晴らしいものを手に入れるべきですか。（タプルの中のもの）

Alvaro · Accepted Answer

イベントレットの使用はどうですか？データが受信されていても、10秒後にリクエストをタイムアウトしたい場合は、このスニペットが役に立ちます。

import requests import eventlet eventlet.monkey_patch() with eventlet.Timeout(10): requests.get("http://ipv4.download.thinkbroadband.com/1GB.Zip", verify=False)

Lukasa · Answer

タイムアウトパラメータを設定します。

r = requests.get(w, verify=False, timeout=10)

その要求にstream=Trueを設定しない限り、接続に10秒以上かかる場合、またはサーバーが10秒以上データを送信しない場合は、これによってrequests.get()の呼び出しがタイムアウトになります。

Hieu · Answer

更新日： http://docs.python-requests.org/en/master/user/advanced/#timeouts

requestsの新しいバージョンでは：

次のようにタイムアウトに単一の値を指定したとします。

r = requests.get('https://github.com', timeout=5)

タイムアウト値は、connectとreadの両方のタイムアウトに適用されます。値を別々に設定したい場合は、タプルを指定します。

r = requests.get('https://github.com', timeout=(3.05, 27))

リモートサーバーの速度が非常に遅い場合は、タイムアウト値としてNoneを渡してコーヒーを1杯取り出すことで、Requestsに応答を永遠に待つように指示できます。

r = requests.get('https://github.com', timeout=None)

私の以前の（おそらく古くなった）回答（これはずっと前に投稿されたものです）：

この問題を克服する他の方法があります。

1。 TimeoutSauce内部クラスを使用する

差出人： https://github.com/kennethreitz/requests/issues/1928#issuecomment-35811896

import requests from requests.adapters import TimeoutSauce class MyTimeout(TimeoutSauce): def __init__(self, *args, **kwargs): connect = kwargs.get('connect', 5) read = kwargs.get('read', connect) super(MyTimeout, self).__init__(connect=connect, read=read) requests.adapters.TimeoutSauce = MyTimeout 
このコードにより、読み取りタイムアウトを接続タイムアウトと同じ値に設定する必要があります。これは、Session.get（）呼び出しで渡すタイムアウト値です。（実際にはこのコードをテストしていないので、簡単なデバッグが必要な場合があります。GitHubウィンドウに直接書いただけです。）

2。 kevinburkeからのリクエストのフォークを使用します。 https://github.com/kevinburke/requests/tree/connect-timeout

そのドキュメントから： https://github.com/kevinburke/requests/blob/connect-timeout/docs/user/advanced.rst

次のようにタイムアウトに単一の値を指定したとします。
r = requests.get('https://github.com', timeout=5) 
タイムアウト値は、接続タイムアウトと読み取りタイムアウトの両方に適用されます。値を別々に設定したい場合は、タプルを指定します。
r = requests.get('https://github.com', timeout=(3.05, 27)) 

kevinburkeが要求しましたメインの要求プロジェクトに統合される予定ですが、まだ受け入れられていません。

Pedro Lobito · Answer

`timeout = int(seconds)`

requests >= 2.4.0以降、 timeout の requests 引数を使用できます。

requests.get(url, timeout=10)

注意：

timeoutは、レスポンスのダウンロード全体に対する制限時間ではありません。むしろ、サーバがtimeout秒の間応答を発行していない場合（より正確には、timeout秒の間基礎となるソケットでバイトが受信されていない場合）、exceptionが発生します。タイムアウトが明示的に指定されていない場合、リクエストはタイムアウトしません。

totokaka · Answer

タイムアウトを作成するには、シグナルを使用できます。

この問題を解決する最善の方法は、おそらく

アラーム信号のハンドラとして例外を設定します
10秒遅れてアラーム信号を呼び出す
try-except-finallyブロック内で関数を呼び出します。
関数がタイムアウトした場合、exceptブロックに到達します。
Finallyブロックではアラームを中止しますので、後で警告されることはありません。

これがいくつかのコード例です：

import signal from time import sleep class TimeoutException(Exception): """ Simple Exception to be called on timeouts. """ pass def _timeout(signum, frame): """ Raise an TimeoutException. This is intended for use as a signal handler. The signum and frame arguments passed to this are ignored. """ # Raise TimeoutException with system default timeout message raise TimeoutException() # Set the handler for the SIGALRM signal: signal.signal(signal.SIGALRM, _timeout) # Send the SIGALRM signal in 10 seconds: signal.alarm(10) try: # Do our code: print('This will take 11 seconds...') sleep(11) print('done!') except TimeoutException: print('It timed out!') finally: # Abort the sending of the SIGALRM signal: signal.alarm(0)

これにはいくつか注意点があります。

スレッドセーフではない、シグナルは常にメインスレッドに配信されるので、これを他のスレッドに入れることはできません。
シグナルのスケジューリングと実際のコードの実行の後、わずかな遅れがあります。これは、例が10秒間しかスリープしていなくてもタイムアウトすることを意味します。

しかし、それはすべて標準のpythonライブラリにあります！ sleep関数のインポートを除き、インポートは1つだけです。あなたがタイムアウトを多くの場所で使用しようとしているならば、あなたは簡単にTimeoutException、_timeoutとシングリングを関数に入れて、それを単に呼び出すことができます。あるいは、デコレータを作成してそれを関数に配置することもできます。下記の回答を参照してください。

これを "context manager" として設定することもできますので、withステートメントで使用できます。

import signal class Timeout(): """ Timeout for use with the `with` statement. """ class TimeoutException(Exception): """ Simple Exception to be called on timeouts. """ pass def _timeout(signum, frame): """ Raise an TimeoutException. This is intended for use as a signal handler. The signum and frame arguments passed to this are ignored. """ raise Timeout.TimeoutException() def __init__(self, timeout=10): self.timeout = timeout signal.signal(signal.SIGALRM, Timeout._timeout) def __enter__(self): signal.alarm(self.timeout) def __exit__(self, exc_type, exc_value, traceback): signal.alarm(0) return exc_type is Timeout.TimeoutException # Demonstration: from time import sleep print('This is going to take maximum 10 seconds...') with Timeout(10): sleep(15) print('No timeout?') print('Done')

このコンテキストマネージャのアプローチで考えられる欠点の1つは、コードが実際にタイムアウトしたかどうかがわからないということです。

出典と推奨読書：

Chris Johnson · Answer

これはやり過ぎかもしれませんが、Celeryの分散タスクキューはタイムアウトを適切にサポートしています。

特に、プロセス内で例外を発生させるだけの弱い時間制限（つまりクリーンアップできる）や、時間制限を超えたときにタスクを終了させる厳しい時間制限を定義できます。

カバーの下では、これはあなたの "before"ポストで参照されているのと同じシグナルアプローチを使いますが、もっと使いやすく管理しやすい方法です。また、監視しているWebサイトのリストが長い場合は、その主な機能（多数のタスクの実行を管理するためのあらゆる種類の方法）の恩恵を受けることができます。

Jorge Leit&#227;o · Answer

私はあなたがmultiprocessingを使うことができ、第三者のパッケージに依存しないと信じます：

import multiprocessing import requests def call_with_timeout(func, args, kwargs, timeout): manager = multiprocessing.Manager() return_dict = manager.dict() # define a wrapper of `return_dict` to store the result. def function(return_dict): return_dict['value'] = func(*args, **kwargs) p = multiprocessing.Process(target=function, args=(return_dict,)) p.start() # Force a max. `timeout` or wait for the process to finish p.join(timeout) # If thread is still active, it didn't finish: raise TimeoutError if p.is_alive(): p.terminate() p.join() raise TimeoutError else: return return_dict['value'] call_with_timeout(requests.get, args=(url,), kwargs={'timeout': 10}, timeout=60)

kwargsに渡されるタイムアウトは、サーバーからの応答を取得するためのタイムアウトany、引数timeoutは、応答を取得するためのタイムアウト完全応答です。

comiventor · Answer

私を許してください、しかし私は誰がなぜ以下のより簡単な解決策を提案しなかったか疑問に思いますか？：-o

## request requests.get('www.mypage.com', timeout=20)

DaWe · Answer

タイムアウトとエラー処理でこのリクエストを試してください：

import requests try: url = "http://google.com" r = requests.get(url, timeout=10) except requests.exceptions.Timeout as e: print e

ACEE · Answer

このコードはsocketError 11004および10060で動作します......

# -*- encoding:UTF-8 -*- __author__ = 'ACE' import requests from PyQt4.QtCore import * from PyQt4.QtGui import * class TimeOutModel(QThread): Existed = pyqtSignal(bool) TimeOut = pyqtSignal() def __init__(self, fun, timeout=500, parent=None): """ @param fun: function or lambda @param timeout: ms """ super(TimeOutModel, self).__init__(parent) self.fun = fun self.timeer = QTimer(self) self.timeer.setInterval(timeout) self.timeer.timeout.connect(self.time_timeout) self.Existed.connect(self.timeer.stop) self.timeer.start() self.setTerminationEnabled(True) def time_timeout(self): self.timeer.stop() self.TimeOut.emit() self.quit() self.terminate() def run(self): self.fun() bb = lambda: requests.get("http://ipv4.download.thinkbroadband.com/1GB.Zip") a = QApplication([]) z = TimeOutModel(bb, 500) print 'timeout' a.exec_()

John Smith · Answer

リクエストに関する質問ではありますが、 pycurl CURLOPT_TIMEOUT やCURLOPT_TIMEOUT_MSを使用するのは非常に簡単です。

スレッドやシグナリングは不要です。

import pycurl import StringIO url = 'http://www.example.com/example.Zip' timeout_ms = 1000 raw = StringIO.StringIO() c = pycurl.Curl() c.setopt(pycurl.TIMEOUT_MS, timeout_ms) # total timeout in milliseconds c.setopt(pycurl.WRITEFUNCTION, raw.write) c.setopt(pycurl.NOSIGNAL, 1) c.setopt(pycurl.URL, url) c.setopt(pycurl.HTTPGET, 1) try: c.perform() except pycurl.error: traceback.print_exc() # error generated on timeout pass # or just pass if you don't want to print the error

Fayzan qureshi · Answer

timeout =（接続タイムアウト、データ読み取りタイムアウト）または単一の引数を指定する（timeout = 1）

import requests try: req = requests.request('GET', 'https://www.google.com',timeout=(1,1)) print(req) except requests.ReadTimeout: print("READ TIME OUT")

Dima Tisnek · Answer

それならば、10秒後にリクエストの内部状態を台無しにするwatchdogスレッドを作成してください。例えば：

基礎となるソケットを閉じます。理想的には
要求が操作を再試行した場合に例外をトリガーします。

システムライブラリによっては、DNS解決に期限を設定できない場合があります。

ub_marco · Answer

stream=Trueオプションを使用している場合は、これを実行できます。

r = requests.get( 'http://url_to_large_file', timeout=1, # relevant only for underlying socket stream=True) with open('/tmp/out_file.txt'), 'wb') as f: start_time = time.time() for chunk in r.iter_content(chunk_size=1024): if chunk: # filter out keep-alive new chunks f.write(chunk) if time.time() - start_time > 8: raise Exception('Request took longer than 8s')

解決策は、シグナルやマルチプロセッシングを必要としません。

Polv · Answer

stream=Trueを設定し、r.iter_content(1024)を使用してください。はい、eventlet.Timeoutはどういうわけか私にとってはうまくいきません。

try: start = time() timeout = 5 with get(config['source']['online'], stream=True, timeout=timeout) as r: r.raise_for_status() content = bytes() content_gen = r.iter_content(1024) while True: if time()-start > timeout: raise TimeoutError('Time out! ({} seconds)'.format(timeout)) try: content += next(content_gen) except StopIteration: break data = content.decode().split('
') if len(data) in [0, 1]: raise ValueError('Bad requests data') except (exceptions.RequestException, ValueError, IndexError, KeyboardInterrupt, TimeoutError) as e: print(e) with open(config['source']['local']) as f: data = [line.strip() for line in f.readlines()]

議論はこちら https://redd.it/80kp1h

technico · Answer

さて、私はこのページで多くの解決策を試してみましたが、それでも不安定さ、ランダムなハング、悪い接続パフォーマンスに直面しました。

私は現在Curlを使っていて、そのような貧弱な実装でも、それが "最大時間"の機能性とグローバルなパフォーマンスに本当に満足しています。

content=commands.getoutput('curl -m6 -Ss "http://mywebsite.xyz"')

ここでは、接続時間と転送時間の両方を記録する、最大6秒の時間パラメータを定義しました。

あなたがPythonic構文に固執することを好むならば、私はCurlがNice pythonバインディングを持っていると確信しています:)

Christian Long · Answer

timeout-decorator というパッケージがあり、これを使ってpython関数をタイムアウトさせることができます。

@timeout_decorator.timeout(5) def mytest(): print("Start") for i in range(1,10): time.sleep(1) print("{} seconds have passed".format(i))

ここでのいくつかの回答が示唆しているシグナルアプローチを使用しています。あるいは、シグナルの代わりにマルチプロセッシングを使用するように指示することもできます（たとえば、マルチスレッド環境にいる場合）。

Denis Kuzin · Answer

もう1つの解決策（ http://docs.python-requests.org/en/master/user/advanced/#streaming-uploads から入手）

アップロードする前にあなたはコンテンツのサイズを知ることができます。

TOO_LONG = 10*1024*1024 # 10 Mb big_url = "http://ipv4.download.thinkbroadband.com/1GB.Zip" r = requests.get(big_url, stream=True) print (r.headers['content-length']) # 1073741824 if int(r.headers['content-length']) < TOO_LONG: # upload content: content = r.content

ただし、送信者が 'content-length'応答フィールドに誤った値を設定する可能性があることに注意してください。