指定された文字列からURLを抽出するC＃正規表現パターン-完全なHTML URLではなく、ベアリンクも

Question

次のことを行う正規表現が必要です

Extract all strings which starts with http:// Extract all strings which starts with www.

だから私はこれらを抽出する必要があります2。

たとえば、以下の指定された文字列テキストがあります

house home go www.monstermmorpg.com Nice hospital http://www.monstermmorpg.com this is incorrect url http://www.monstermmorpg.commerged continue

したがって、上記の文字列から私は取得します

 www.monstermmorpg.com http://www.monstermmorpg.com http://www.monstermmorpg.commerged

正規表現または別の方法を探しています。ありがとうございました。

C＃4.0

Jason Larke · Accepted Answer

これを処理するための非常に単純な正規表現を作成するか、より伝統的な文字列分割+ LINQ方法論を使用できます。

正規表現

_var linkParser = new Regex(@"\b(?:https?://|www\.)\S+\b", RegexOptions.Compiled | RegexOptions.IgnoreCase); var rawString = "house home go www.monstermmorpg.com Nice hospital http://www.monstermmorpg.com this is incorrect url http://www.monstermmorpg.commerged continue"; foreach(Match m in linkParser.Matches(rawString)) MessageBox.Show(m.Value); _

説明パターン：

_\b -matches a Word boundary (spaces, periods..etc) (?: -define the beginning of a group, the ?: specifies not to capture the data within this group. https?:// - Match http or https (the '?' after the "s" makes it optional) | -OR www\. -literal string, match www. (the \. means a literal ".") ) -end group \S+ -match a series of non-whitespace characters. \b -match the closing Word boundary. _

基本的に、パターンはhttp:// OR https:// OR www. (?:https?://|www\.)で始まる文字列を探し、次の空白までのすべての文字に一致します。

従来の文字列オプション

_var rawString = "house home go www.monstermmorpg.com Nice hospital http://www.monstermmorpg.com this is incorrect url http://www.monstermmorpg.commerged continue"; var links = rawString.Split("	
 ".ToCharArray(), StringSplitOptions.RemoveEmptyEntries).Where(s => s.StartsWith("http://") || s.StartsWith("www.") || s.StartsWith("https://")); foreach (string s in links) MessageBox.Show(s); _