Pythonを使用してWebサイトのスクリーンショット/画像を取得するにはどうすればよいですか？

Question

私が達成したいのは、Pythonの任意のWebサイトからWebサイトのスクリーンショットを取得することです。

環境：Linux

ars · Accepted Answer

Macには webkit2png があり、Linux + KDEでは khtml2png を使用できます。前者を試してみましたが、非常にうまく機能し、後者が使用されると聞いています。

私は最近、クロスプラットフォームであると主張する QtWebKit に出会いました（QtはWebKitをライブラリに組み込んだと思います）。しかし、私はそれを試したことがないので、あなたにそれ以上話すことはできません。

QtWebKitリンクは、Pythonからアクセスする方法を示しています。少なくともサブプロセスを使用して、他のサブプロセスと同じことを行うことができるはずです。

hoju · Answer

以下にwebkitを使用した簡単なソリューションを示します。 http://webscraping.com/blog/Webpage-screenshots-with-webkit/

import sys import time from PyQt4.QtCore import * from PyQt4.QtGui import * from PyQt4.QtWebKit import * class Screenshot(QWebView): def __init__(self): self.app = QApplication(sys.argv) QWebView.__init__(self) self._loaded = False self.loadFinished.connect(self._loadFinished) def capture(self, url, output_file): self.load(QUrl(url)) self.wait_load() # set to webpage size frame = self.page().mainFrame() self.page().setViewportSize(frame.contentsSize()) # render image image = QImage(self.page().viewportSize(), QImage.Format_ARGB32) Painter = QPainter(image) frame.render(Painter) Painter.end() print 'saving', output_file image.save(output_file) def wait_load(self, delay=0): # process app events until page loaded while not self._loaded: self.app.processEvents() time.sleep(delay) self._loaded = False def _loadFinished(self, result): self._loaded = True s = Screenshot() s.capture('http://webscraping.com', 'website.png') s.capture('http://webscraping.com/blog', 'blog.png')

Aamir Adnan · Answer

さまざまなソースからヘルプを取得することで、ここに私のソリューションがあります。完全なWebページのスクリーンキャプチャを取得し、それを切り取り（オプション）、切り取った画像からサムネイルも生成します。要件は次のとおりです。

要件：

NodeJSをインストールする
Nodeのパッケージマネージャーを使用して、phantomjsをインストールします：npm -g install phantomjs
Seleniumをインストールします（使用している場合は、virtualenvに）
ImageMagickをインストールする
システムパスへのphantomjsの追加（Windows）

import os from subprocess import Popen, PIPE from Selenium import webdriver abspath = lambda *p: os.path.abspath(os.path.join(*p)) ROOT = abspath(os.path.dirname(__file__)) def execute_command(command): result = Popen(command, Shell=True, stdout=PIPE).stdout.read() if len(result) > 0 and not result.isspace(): raise Exception(result) def do_screen_capturing(url, screen_path, width, height): print "Capturing screen.." driver = webdriver.PhantomJS() # it save service log file in same directory # if you want to have log file stored else where # initialize the webdriver.PhantomJS() as # driver = webdriver.PhantomJS(service_log_path='/var/log/phantomjs/ghostdriver.log') driver.set_script_timeout(30) if width and height: driver.set_window_size(width, height) driver.get(url) driver.save_screenshot(screen_path) def do_crop(params): print "Croping captured image.." command = [ 'convert', params['screen_path'], '-crop', '%sx%s+0+0' % (params['width'], params['height']), params['crop_path'] ] execute_command(' '.join(command)) def do_thumbnail(params): print "Generating thumbnail from croped captured image.." command = [ 'convert', params['crop_path'], '-filter', 'Lanczos', '-thumbnail', '%sx%s' % (params['width'], params['height']), params['thumbnail_path'] ] execute_command(' '.join(command)) def get_screen_shot(**kwargs): url = kwargs['url'] width = int(kwargs.get('width', 1024)) # screen width to capture height = int(kwargs.get('height', 768)) # screen height to capture filename = kwargs.get('filename', 'screen.png') # file name e.g. screen.png path = kwargs.get('path', ROOT) # directory path to store screen crop = kwargs.get('crop', False) # crop the captured screen crop_width = int(kwargs.get('crop_width', width)) # the width of crop screen crop_height = int(kwargs.get('crop_height', height)) # the height of crop screen crop_replace = kwargs.get('crop_replace', False) # does crop image replace original screen capture? thumbnail = kwargs.get('thumbnail', False) # generate thumbnail from screen, requires crop=True thumbnail_width = int(kwargs.get('thumbnail_width', width)) # the width of thumbnail thumbnail_height = int(kwargs.get('thumbnail_height', height)) # the height of thumbnail thumbnail_replace = kwargs.get('thumbnail_replace', False) # does thumbnail image replace crop image? screen_path = abspath(path, filename) crop_path = thumbnail_path = screen_path if thumbnail and not crop: raise Exception, 'Thumnail generation requires crop image, set crop=True' do_screen_capturing(url, screen_path, width, height) if crop: if not crop_replace: crop_path = abspath(path, 'crop_'+filename) params = { 'width': crop_width, 'height': crop_height, 'crop_path': crop_path, 'screen_path': screen_path} do_crop(params) if thumbnail: if not thumbnail_replace: thumbnail_path = abspath(path, 'thumbnail_'+filename) params = { 'width': thumbnail_width, 'height': thumbnail_height, 'thumbnail_path': thumbnail_path, 'crop_path': crop_path} do_thumbnail(params) return screen_path, crop_path, thumbnail_path if __== '__main__': ''' Requirements: Install NodeJS Using Node's package manager install phantomjs: npm -g install phantomjs install Selenium (in your virtualenv, if you are using that) install imageMagick add phantomjs to system path (on windows) ''' url = 'http://stackoverflow.com/questions/1197172/how-can-i-take-a-screenshot-image-of-a-website-using-python' screen_path, crop_path, thumbnail_path = get_screen_shot( url=url, filename='sof.png', crop=True, crop_replace=False, thumbnail=True, thumbnail_replace=False, thumbnail_width=200, thumbnail_height=150, )

これらは生成された画像です：

aezell · Answer

私はarsの答えにコメントすることはできませんが、実際には Roland Tapkenのコード QtWebkitを使用して実行しており、それは非常にうまく機能しています。

Rolandが彼のブログに投稿した内容がUbuntuでうまく機能することを確認したかっただけです。私たちのプロダクションバージョンは、彼が書いたものを一切使用しませんでしたが、PyQt/QtWebKitバインディングを使用して大成功を収めました。

Joolah · Answer

セレンを使用して行うことができます

from Selenium import webdriver DRIVER = 'chromedriver' driver = webdriver.Chrome(DRIVER) driver.get('https://www.spotify.com') screenshot = driver.save_screenshot('my_screenshot.png') driver.quit()

https://sites.google.com/a/chromium.org/chromedriver/getting-started

Michael H. · Answer

Rendertron の使用はオプションです。内部では、これはヘッドレスChrome以下のエンドポイントを公開しています：

Npmでrendertronをインストールし、1つのターミナルでrendertronを実行し、http://localhost:3000/screenshot/:urlにアクセスしてファイルを保存しますが、デモは render-tron.appspot.com で利用できますnpmパッケージをインストールせずにこのPython3スニペットをローカルで実行することが可能です：

import requests BASE = 'https://render-tron.appspot.com/screenshot/' url = 'https://google.com' path = 'target.jpg' response = requests.get(BASE + url, stream=True) # save file, see https://stackoverflow.com/a/13137873/7665691 if response.status_code == 200: with open(path, 'wb') as file: for chunk in response: file.write(chunk)

Daniel Naab · Answer

実行している環境については言及しませんが、HTMLをレンダリングできる純粋なPython Webブラウザーがないため、大きな違いが生じます。

ただし、Macを使用している場合は、 webkit2png を使用して非常に成功しています。そうでない場合、他の人が指摘したように、多くのオプションがあります。