Webページのコンテンツを取得して文字列変数に保存する方法

Question

ASP.NETを使用してWebページのコンテンツを取得するにはどうすればよいですか？ WebページのHTMLを取得して文字列変数に保存するプログラムを作成する必要があります。

dhinesh · Accepted Answer

WebClient を使用できます

WebClient client = new WebClient(); string downloadString = client.DownloadString("http://www.gooogle.com");

Scott · Answer

以前にWebclient.Downloadstringの問題に遭遇しました。もしそうなら、これを試すことができます：

WebRequest request = WebRequest.Create("http://www.google.com"); WebResponse response = request.GetResponse(); Stream data = response.GetResponseStream(); string html = String.Empty; using (StreamReader sr = new StreamReader(data)) { html = sr.ReadToEnd(); }

user2246674 · Answer

WebClient.DownloadStringを使用してnotをお勧めします。これは、（少なくとも.NET 3.5では）DownloadStringは、BOMが存在する場合、BOMを使用/削除するほどスマートではないためです。これにより、UTF-8データが返されたときに（少なくとも文字セットなしで）BOM（ï»¿）が文字列の一部として誤って表示される可能性があります。

代わりに、このわずかなバリエーションはBOMで正しく機能します。

string ReadTextFromUrl(string url) { // WebClient is still convenient // Assume UTF8, but detect BOM - could also honor response charset I suppose using (var client = new WebClient()) using (var stream = client.OpenRead(url)) using (var textReader = new StreamReader(stream, Encoding.UTF8, true)) { return textReader.ReadToEnd(); } }

Steven Spielberg · Answer

Webclient client = new Webclient(); string content = client.DownloadString(url);

取得したいページのURLを渡します。 htmlagilitypackを使用して結果を解析できます。