ボディコピーからリンクとimg src参照を解析する方法は？

Question

ノードとコメントの本文のリンクとimg参照を変更する必要があります。 MPDは、スクリプトの使用に関して、既に優れたアドバイスを提供してくれました。

DrupalまたはPHP=に、ボディコピー内のすべてのリンクを解析する簡単な方法はありますか？

Clive · Accepted Answer

HTMLの解析は、入力の形式が適切かどうかに応じて、多少の不備があるため、このようなことを行うのは簡単ではありません。

私は常に、HTMLを DOMDocument オブジェクトにロードし、ループして必要な要素の属性/値を変更してから、DOMDocument新しい出力を取得します。

これは非常に大まかな例ですが、うまくいけばアイデアが得られるでしょう。

$html = ' <p>Blah blah blah</p> <p><a href="/some-path" title="A Title">Link Text</a> </p><p><img src="/some-path.jpg" alt="Some alt" /></p>'; $doc = new DOMDocument; // This is a reasonable use of the @ operator as malformed HTML will produce // a lot of warnings. Please don't shoot me ;) @$doc->loadHTML($html); // Get the links. $links = $doc->getElementsByTagName('a'); foreach ($links as $link) { // Change the value of an attribute based on the current value. if ($link->getAttribute('href') == '/some-path') { $link->setAttribute('href', '/some-other-path'); } } // Get the images. $images = $doc->getElementsByTagName('img'); foreach ($images as $image) { // Change the value of an attribute based on the current value. if ($image->getAttribute('src') == '/some-path.jpg') { $image->setAttribute('src', '/some-other-path.jpg'); } } // Get the new HTML $new_html = $doc->saveHTML(); // Strip out the tags that loadHTML() introduces to get the clean HTML. $patterns = array("/^\<\!DOCTYPE.*?<html><body>/si", "!</body></html>$!si"); $body_text = preg_replace($patterns, '', $new_html); // Update the node body. $node->body[$node->language][0]['value'] = $body_text;

[〜＃〜]編集[〜＃〜]

私はこれについてSOで良い質問を見たのを知っていました。一般的な主題についてのよい議論については、 PHPでHTMLを解析および処理する方法？を参照してください。

Capi Etheriel · Answer

その他の質問でお答えしたように、フィルターは実際にはデータではなく出力を変更するため、適切なソリューションです。つまり、別の方法で変更する（または変換のバグを修正する）必要がある場合でも、データは残っています。

hook_filter_info（）のドキュメントを確認して、PHPでHTMLを解析してください。