Python-HTTPSを使用したurllib2非同期/スレッドリクエストの例

Question

Pythonのurllib2を使用して非同期/スレッド化されたHTTPSリクエストを機能させるのに時間がかかっています。

Urllib2.Request、urllib2.build_opener、およびurllib2.HTTPSHandlerのサブクラスを実装する基本的な例はありますか？

ありがとう！

nosklo · Answer

以下のコードは、7つのhttpリクエストを同時に非同期で実行します。スレッドを使用せず、代わりに twisted ライブラリを使用した非同期ネットワークを使用します。

from twisted.web import client from twisted.internet import reactor, defer urls = [ 'http://www.python.org', 'http://stackoverflow.com', 'http://www.twistedmatrix.com', 'http://www.google.com', 'http://launchpad.net', 'http://github.com', 'http://bitbucket.org', ] def finish(results): for result in results: print 'GOT PAGE', len(result), 'bytes' reactor.stop() waiting = [client.getPage(url) for url in urls] defer.gatherResults(waiting).addCallback(finish) reactor.run()

lkcl · Answer

urllib2のハンドラーを使用する、非常に簡単な方法があります。これは、次の場所にあります。 http://pythonquirks.blogspot.co.uk/2009/12/asynchronous-http-request.html

#!/usr/bin/env python import urllib2 import threading class MyHandler(urllib2.HTTPHandler): def http_response(self, req, response): print "url: %s" % (response.geturl(),) print "info: %s" % (response.info(),) for l in response: print l return response o = urllib2.build_opener(MyHandler()) t = threading.Thread(target=o.open, args=('http://www.google.com/',)) t.start() print "I'm asynchronous!" t.join() print "I've ended!"

Corey Goldberg · Answer

これは、urllib2（https付き）とスレッドを使用した例です。各スレッドはURLのリストを循環し、リソースを取得します。

import itertools import urllib2 from threading import Thread THREADS = 2 URLS = ( 'https://foo/bar', 'https://foo/baz', ) def main(): for _ in range(THREADS): t = Agent(URLS) t.start() class Agent(Thread): def __init__(self, urls): Thread.__init__(self) self.urls = urls def run(self): urls = itertools.cycle(self.urls) while True: data = urllib2.urlopen(urls.next()).read() if __name__ == '__main__': main()

bmpasini · Answer

これを行うには、非同期IOを使用できます。

リクエスト + gevent = grequests

GRequestsを使用すると、GeventでRequestsを使用して、非同期HTTPリクエストを簡単に作成できます。

import grequests urls = [ 'http://www.heroku.com', 'http://tablib.org', 'http://httpbin.org', 'http://python-requests.org', 'http://kennethreitz.com' ] rs = (grequests.get(u) for u in urls) grequests.map(rs)

Xavier Combelle · Answer

これが eventlet のコードです

urls = ["http://www.google.com/intl/en_ALL/images/logo.gif", "https://wiki.secondlife.com/w/images/secondlife.jpg", "http://us.i1.yimg.com/us.yimg.com/i/ww/beta/y3.gif"] import eventlet from eventlet.green import urllib2 def fetch(url): return urllib2.urlopen(url).read() pool = eventlet.GreenPool() for body in pool.imap(fetch, urls): print "got body", len(body)