セレン：ページの他のスクリプトをロード/実行する前に、JavaScriptをページに挿入/実行する方法は？

Question

私はいくつかのページを閲覧するためにSelenium python webdriverを使用しています。他のJavascriptコードがロードされて実行される前に、JavaScriptコードをページに挿入したいのです。一方、私はJSコードをそのページの最初のJSコードとして実行する必要があります。Seleniumでそれを行う方法はありますか？

数時間グーグルで検索しましたが、適切な答えが見つかりませんでした。

init_js · Accepted Answer

ページコンテンツを変更できない場合は、プロキシを使用するか、ブラウザにインストールされている拡張機能のコンテンツスクリプトを使用できます。 Selenium内でそれを行うには、既存の要素の子の1つとしてスクリプトを挿入するコードを記述しますが、ページがロードされる前に実行することはできません（ドライバーのget()呼び出しが戻ります。）

String name = (String) ((JavascriptExecutor) driver).executeScript( "(function () { ... })();" ...

ドキュメントは、コードが実行を開始する瞬間を特定していません。保証がプロキシまたは拡張コンテンツスクリプトルートでのみ満足できるように、DOMがロードを開始する前にそれを行う必要があります。

最小限のハーネスでページをインストルメントできる場合、特別なurlクエリパラメータの存在を検出して追加のコンテンツを読み込むことができますが、インラインスクリプトを使用して行う必要があります。疑似コード：

 <html> <head> <script type="text/javascript"> (function () { if (location && location.href && location.href.indexOf("Selenium_TEST") >= 0) { var injectScript = document.createElement("script"); injectScript.setAttribute("type", "text/javascript"); //another option is to perform a synchronous XHR and inject via innerText. injectScript.setAttribute("src", URL_OF_EXTRA_SCRIPT); document.documentElement.appendChild(injectScript); //optional. cleaner to remove. it has already been loaded at this point. document.documentElement.removeChild(injectScript); } })(); </script> ...

Jonathan · Answer

ブラウザで解析および実行される前にページのhtmlに何かを挿入したい場合は、 Mitmproxy などのプロキシを使用することをお勧めします。

Matt M. · Answer

バージョン1.0.9以降、 Selenium-wire はリクエストへの応答を変更する機能を獲得しました。以下は、スクリプトがWebブラウザーに到達する前にページにスクリプトを挿入するこの機能の例です。

import os from seleniumwire import webdriver from gzip import compress, decompress from urllib.parse import urlparse from lxml import html from lxml.etree import ParserError from lxml.html import builder script_elem_to_inject = builder.SCRIPT('alert("injected")') def inject(req, req_body, res, res_body): # various checks to make sure we're only injecting the script on appropriate responses # we check that the content type is HTML, that the status code is 200, and that the encoding is gzip if res.headers.get_content_subtype() != 'html' or res.status != 200 or res.getheader('Content-Encoding') != 'gzip': return None try: parsed_html = html.fromstring(decompress(res_body)) except ParserError: return None try: parsed_html.head.insert(0, script_elem_to_inject) except IndexError: # no head element return None injected.append((req, req_body, res, res_body, parsed_html)) return compress(html.tostring(parsed_html)) drv = webdriver.Firefox(seleniumwire_options={'custom_response_handler': inject}) drv.header_overrides = {'Accept-Encoding': 'gzip'} # ensure we only get gzip encoded responses

ブラウザーをリモートで制御し、ページコンテンツが読み込まれる前にスクリプトを挿入できるようにする一般的な別の方法は、別のプロトコルに完全に基づいたライブラリを使用することです（例：DevToolsプロトコル）。 A Python実装はここにあります： https://github.com/pyppeteer/pyppeteer2

Jacob · Answer

だから数年が経っていますが、Webページのコンテンツを変更したり、プロキシを使用したりせずにこれを行う方法を見つけました。私はnodejsバージョンを使用していますが、おそらくAPIは他の言語でも一貫しています。あなたがしたいことは次のとおりです

const {Builder, By, Key, until, Capabilities} = require('Selenium-webdriver'); const capabilities = new Capabilities(); cap.setPageLoadStrategy('eager'); // Options are 'eager', 'none', 'normal' let driver = await new Builder().forBrowser('firefox').setFirefoxOptions(capabilities).build(); await driver.get('http://example.com'); driver.executeScript(\` console.log('hello' \`)

その「熱心な」オプションは私にはうまくいきます。 'none'オプションを使用する必要がある場合があります。ドキュメント： https://seleniumhq.github.io/Selenium/docs/api/javascript/module/Selenium-webdriver/lib/capabilities_exports_PageLoadStrategy.html

編集： 'eager'オプションはChromeにはまだ実装されていません...

phk · Answer

https://pypi.org/project/Selenium-wire/ が追加され、すべてのリクエストへのアクセス/変更が比較的簡単になりました。