HTML Agility Packを使用してWebサイトからすべての画像を取得するにはどうすればよいですか？

Question

HTMLAgilityPackをダウンロードしましたが、ドキュメントに例がありません。

ウェブサイトからすべての画像をダウンロードする方法を探しています。物理イメージではなく、アドレス文字列。

<img src="blabalbalbal.jpeg" />

各imgタグのソースをプルする必要があります。図書館とそれが提供できるものの感触を知りたいだけです。誰もがこれが仕事に最適なツールだと言った。

編集

public void GetAllImages() { WebClient x = new WebClient(); string source = x.DownloadString(@"http://www.google.com"); HtmlAgilityPack.HtmlDocument document = new HtmlAgilityPack.HtmlDocument(); document.Load(source); //I can't use the Descendants method. It doesn't appear. var ImageURLS = document.desc .Select(e => e.GetAttributeValue("src", null)) .Where(s => !String.IsNullOrEmpty(s)); }

SLaks · Accepted Answer

次のように、LINQを使用してこれを行うことができます。

var document = new HtmlWeb().Load(url); var urls = document.DocumentNode.Descendants("img") .Select(e => e.GetAttributeValue("src", null)) .Where(s => !String.IsNullOrEmpty(s));

[〜＃〜] edit [〜＃〜]：このコードは実際に機能するようになりました。 document.DocumentNodeを書くのを忘れていました。

Anthony · Answer

1つの例に基づいていますが、XPathが変更されています。

 HtmlDocument doc = new HtmlDocument(); List<string> image_links = new List<string>(); doc.Load("file.htm"); foreach(HtmlNode link in doc.DocumentElement.SelectNodes("//img")) { image_links.Add( link.GetAttributeValue("src", "") ); }

この拡張機能がわからないので、配列を別の場所に書き出す方法がわかりませんが、少なくともデータは取得できます。（また、私は配列を正しく定義していません、確かです。申し訳ありません）。

編集

あなたの例を使用して：

public void GetAllImages() { WebClient x = new WebClient(); string source = x.DownloadString(@"http://www.google.com"); HtmlAgilityPack.HtmlDocument document = new HtmlAgilityPack.HtmlDocument(); List<string> image_links = new List<string>(); document.Load(source); foreach(HtmlNode link in document.DocumentElement.SelectNodes("//img")) { image_links.Add( link.GetAttributeValue("src", "") ); } }