PHP URLの検証/正規表現

Question

URLの単純な正規表現を探していますが、うまく機能する便利なものはありますか？私は、zendフレームワークの検証クラスを見つけることができず、いくつかの実装を見てきました。

Owen · Accepted Answer

私はこれをいくつかのプロジェクトで使用しましたが、問題に遭遇したとは思いませんが、網羅的ではないと確信しています：

$text = preg_replace( '#((https?|ftp)://(\S*?\.\S*?))([\s){},;"\':<]|\.\s|$)#i', "'<a href=\"$1\" target=\"_blank\">$3</a>$4'", $text );

最後のランダムジャンクのほとんどは、文のhttp://domain.com.などの状況に対処することです（末尾のピリオドの一致を避けるため）。私はそれがクリーンアップできると確信していますが、うまくいったので。私は多かれ少なかれそれをプロジェクト間でコピーしました。

Stanislav · Answer

filter_var()関数を使用して、文字列がURLかどうかを検証します。

var_dump(filter_var('example.com', FILTER_VALIDATE_URL));

不要な場合は正規表現を使用するのは悪い習慣です。

EDIT：注意してください、このソリューションはユニコードセーフでもXSSセーフでもありません。複雑な検証が必要な場合は、他の場所を調べる方が良いかもしれません。

catchdave · Answer

PHPマニュアルによると、parse_urlはnotを使用してURLを検証する必要があります。

残念ながら、filter_var('example.com', FILTER_VALIDATE_URL)のパフォーマンスは改善されていないようです。

parse_url()とfilter_var()の両方は、http://...などの不正な形式のURLを渡します

したがって、この場合-正規表現isがより良い方法です。

Roger · Answer

URLが本当に存在するかどうかを知りたい場合に備えて：

function url_exist($url){//se passar a URL existe $c=curl_init(); curl_setopt($c,CURLOPT_URL,$url); curl_setopt($c,CURLOPT_HEADER,1);//get the header curl_setopt($c,CURLOPT_NOBODY,1);//and *only* get the header curl_setopt($c,CURLOPT_RETURNTRANSFER,1);//get the response as a string from curl_exec(), rather than echoing it curl_setopt($c,CURLOPT_FRESH_CONNECT,1);//don't use a cached version of the url if(!curl_exec($c)){ //echo $url.' inexists'; return false; }else{ //echo $url.' exists'; return true; } //$httpcode=curl_getinfo($c,CURLINFO_HTTP_CODE); //return ($httpcode<400); }

abhiomkar · Answer

John Gruber（Daring Fireball）：

正規表現：

(?i)\b((?:https?://|www\d{0,3}[.]|[a-z0-9.\-]+[.][a-z]{2,4}/)(?:[^\s()<>]+|$([^\s()<>]+|(\([^\s()<>]+$))*\))+(?:$([^\s()<>]+|(\([^\s()<>]+$))*\)|[^\s`!(){};:'\".,<>?«»“”‘’]))

preg_match（）で使用：

preg_match("/(?i)\b((?:https?://|www\d{0,3}[.]|[a-z0-9.\-]+[.][a-z]{2,4}/)(?:[^\s()<>]+|$([^\s()<>]+|(\([^\s()<>]+$))*\))+(?:$([^\s()<>]+|(\([^\s()<>]+$))*\)|[^\s`!(){};:'\".,<>?«»“”‘’]))/", $url)

拡張された正規表現パターン（コメント付き）は次のとおりです。

(?xi) \b ( # Capture 1: entire matched URL (?: https?:// # http or https protocol | # or www\d{0,3}[.] # "www.", "www1.", "www2." … "www999." | # or [a-z0-9.\-]+[.][a-z]{2,4}/ # looks like domain name followed by a slash ) (?: # One or more: [^\s()<>]+ # Run of non-space, non-()<> | # or $([^\s()<>]+|(\([^\s()<>]+$))*\) # balanced parens, up to 2 levels )+ (?: # End with: $([^\s()<>]+|(\([^\s()<>]+$))*\) # balanced parens, up to 2 levels | # or [^\s`!(){};:'".,<>?«»“”‘’] # not a space or one of these punct chars ) )

詳細については、以下を参照してください。 http://daringfireball.net/2010/07/improved_regex_for_matching_urls

promaty · Answer

この場合、正規表現を使用するのが賢明なことだとは思いません。すべての可能性を一致させることは不可能です。たとえ一致したとしても、URLが存在しない可能性があります。

Urlが実際に存在し、読み取り可能かどうかをテストする非常に簡単な方法を次に示します。

if (preg_match("#^https?://.+#", $link) and @fopen($link,"r")) echo "OK";

（preg_matchがない場合、これはサーバー上のすべてのファイル名も検証します）

Vikash Kumar · Answer

 function validateURL($URL) { $pattern_1 = "/^(http|https|ftp):\/\/(([A-Z0-9][A-Z0-9_-]*)(\.[A-Z0-9][A-Z0-9_-]*)+.(com|org|net|dk|at|us|tv|info|uk|co.uk|biz|se)$)(:(\d+))?\/?/i"; $pattern_2 = "/^(www)((\.[A-Z0-9][A-Z0-9_-]*)+.(com|org|net|dk|at|us|tv|info|uk|co.uk|biz|se)$)(:(\d+))?\/?/i"; if(preg_match($pattern_1, $URL) || preg_match($pattern_2, $URL)){ return true; } else{ return false; } }

Peter Bailey · Answer

私はこれを成功裏に使用しました-どこから入手したのか覚えていません

$pattern = "/\b(?:(?:https?|ftp):\/\/|www\.)[-a-z0-9+&@#\/%?=~_|!:,.;]*[-a-z0-9+&@#\/%=~_|]/i";

George Milonas · Answer

そしてあなたの答えがあります=）それを破ろうとして、あなたはできません!!!

function link_validate_url($text) { $LINK_DOMAINS = 'aero|arpa|asia|biz|com|cat|coop|edu|gov|info|int|jobs|mil|museum|name|nato|net|org|pro|travel|mobi|local'; $LINK_ICHARS_DOMAIN = (string) html_entity_decode(implode("", array( // @TODO completing letters ... "&#x00E6;", // æ "&#x00C6;", // Æ "&#x00C0;", // À "&#x00E0;", // à "&#x00C1;", // Á "&#x00E1;", // á "&#x00C2;", // Â "&#x00E2;", // â "&#x00E5;", // å "&#x00C5;", // Å "&#x00E4;", // ä "&#x00C4;", // Ä "&#x00C7;", // Ç "&#x00E7;", // ç "&#x00D0;", // Ð "&#x00F0;", // ð "&#x00C8;", // È "&#x00E8;", // è "&#x00C9;", // É "&#x00E9;", // é "&#x00CA;", // Ê "&#x00EA;", // ê "&#x00CB;", // Ë "&#x00EB;", // ë "&#x00CE;", // Î "&#x00EE;", // î "&#x00CF;", // Ï "&#x00EF;", // ï "&#x00F8;", // ø "&#x00D8;", // Ø "&#x00F6;", // ö "&#x00D6;", // Ö "&#x00D4;", // Ô "&#x00F4;", // ô "&#x00D5;", // Õ "&#x00F5;", // õ "&#x0152;", // Œ "&#x0153;", // œ "&#x00FC;", // ü "&#x00DC;", // Ü "&#x00D9;", // Ù "&#x00F9;", // ù "&#x00DB;", // Û "&#x00FB;", // û "&#x0178;", // Ÿ "&#x00FF;", // ÿ "&#x00D1;", // Ñ "&#x00F1;", // ñ "&#x00FE;", // þ "&#x00DE;", // Þ "&#x00FD;", // ý "&#x00DD;", // Ý "&#x00BF;", // ¿ )), ENT_QUOTES, 'UTF-8'); $LINK_ICHARS = $LINK_ICHARS_DOMAIN . (string) html_entity_decode(implode("", array( "&#x00DF;", // ß )), ENT_QUOTES, 'UTF-8'); $allowed_protocols = array('http', 'https', 'ftp', 'news', 'nntp', 'telnet', 'mailto', 'irc', 'ssh', 'sftp', 'webcal'); // Starting a parenthesis group with (?: means that it is grouped, but is not captured $protocol = '((?:'. implode("|", $allowed_protocols) .')://)'; $authentication = "(?:(?:(?:[\w\.\-\+!$&'*\+,;=" . $LINK_ICHARS . "]|%[0-9a-f]{2})+(?::(?:[\w". $LINK_ICHARS ."\.\-\+%!$&'*\+,;=]|%[0-9a-f]{2})*)?)?@)"; $domain = '(?:(?:[a-z0-9' . $LINK_ICHARS_DOMAIN . ']([a-z0-9'. $LINK_ICHARS_DOMAIN . '\-_])*)(\.(([a-z0-9' . $LINK_ICHARS_DOMAIN . '\-_])+\.)*('. $LINK_DOMAINS .'|[a-z]{2}))?)'; $ipv4 = '(?:[0-9]{1,3}(\.[0-9]{1,3}){3})'; $ipv6 = '(?:[0-9a-fA-F]{1,4}(\:[0-9a-fA-F]{1,4}){7})'; $port = '(?::([0-9]{1,5}))'; // Pattern specific to external links. $external_pattern = '/^'. $protocol .'?'. $authentication .'?('. $domain .'|'. $ipv4 .'|'. $ipv6 .' |localhost)'. $port .'?'; // Pattern specific to internal links. $internal_pattern = "/^(?:[a-z0-9". $LINK_ICHARS ."_\-+]+)"; $internal_pattern_file = "/^(?:[a-z0-9". $LINK_ICHARS ."_\-+\.]+)$/i"; $directories = "(?:/[a-z0-9". $LINK_ICHARS ."_\-\.~+%=&,$'#!():;*@]*)*"; // Yes, four backslashes == a single backslash. $query = "(?:/?\?([?a-z0-9". $LINK_ICHARS ."+_|\-\.~/\\%=&,$'():;*@{} ]*))"; $anchor = "(?:#[a-z0-9". $LINK_ICHARS ."_\-\.~+%=&,$'():;*@/\?]*)"; // The rest of the path for a standard URL. $end = $directories .'?'. $query .'?'. $anchor .'?'.'$/i'; $message_id = '[^@].*@'. $domain; $newsgroup_name = '(?:[0-9a-z+-]*\.)*[0-9a-z+-]*'; $news_pattern = '/^news:('. $newsgroup_name .'|'. $message_id .')$/i'; $user = '[a-zA-Z0-9'. $LINK_ICHARS .'_\-\.\+\^!#\$%&*+/\=\?\`\|\{\}~\']+'; $email_pattern = '/^mailto:'. $user .'@'.'(?:'. $domain .'|'. $ipv4 .'|'. $ipv6 .'|localhost)'. $query .'?$/'; if (strpos($text, '<front>') === 0) { return false; } if (in_array('mailto', $allowed_protocols) && preg_match($email_pattern, $text)) { return false; } if (in_array('news', $allowed_protocols) && preg_match($news_pattern, $text)) { return false; } if (preg_match($internal_pattern . $end, $text)) { return false; } if (preg_match($external_pattern . $end, $text)) { return false; } if (preg_match($internal_pattern_file, $text)) { return false; } return true; }

Frankie · Answer

編集：
As incidence 指摘したように、このコードはPHP 5.3.0（2009-06-30）のリリースで廃止され、それに応じて使用する必要があります。

ちょうど2セントですが、この機能を開発し、しばらく使用して成功しました。十分に文書化および分離されているため、簡単に変更できます。

// Checks if string is a URL // @param string $url // @return bool function isURL($url = NULL) { if($url==NULL) return false; $protocol = '(http://|https://)'; $allowed = '([a-z0-9]([-a-z0-9]*[a-z0-9]+)?)'; $regex = "^". $protocol . // must include the protocol '(' . $allowed . '{1,63}\.)+'. // 1 or several sub domains with a max of 63 chars '[a-z]' . '{2,6}'; // followed by a TLD if(eregi($regex, $url)==true) return true; else return false; }

jini · Answer

function is_valid_url ($url="") { if ($url=="") { $url=$this->url; } $url = @parse_url($url); if ( ! $url) { return false; } $url = array_map('trim', $url); $url['port'] = (!isset($url['port'])) ? 80 : (int)$url['port']; $path = (isset($url['path'])) ? $url['path'] : ''; if ($path == '') { $path = '/'; } $path .= ( isset ( $url['query'] ) ) ? "?$url[query]" : ''; if ( isset ( $url['Host'] ) AND $url['Host'] != gethostbyname ( $url['Host'] ) ) { if ( PHP_VERSION >= 5 ) { $headers = get_headers("$url[scheme]://$url[Host]:$url[port]$path"); } else { $fp = fsockopen($url['Host'], $url['port'], $errno, $errstr, 30); if ( ! $fp ) { return false; } fputs($fp, "HEAD $path HTTP/1.1
Host: $url[Host]

"); $headers = fread ( $fp, 128 ); fclose ( $fp ); } $headers = ( is_array ( $headers ) ) ? implode ( "
", $headers ) : $headers; return ( bool ) preg_match ( '#^HTTP/.*\s+[(200|301|302)]+\s#i', $headers ); } return false; }

Xavi Montero · Answer

この.NET StackOverflowの質問およびその質問のこの参照記事に触発されています（URIはURLとURNの両方を検証することを意味します）。

if( ! preg_match( "/^([a-z][a-z0-9+.-]*):(?:\/\/((?:(?=((?:[a-z0-9-._~!$&'()*+,;=:]|%[0-9A-F]{2})*))(\3)@)?(?=($$[0-9A-F:.]{2,}$$|(?:[a-z0-9-._~!$&'()*+,;=]|%[0-9A-F]{2})*))\5(?::(?=(\d*))\6)?)(\/(?=((?:[a-z0-9-._~!$&'()*+,;=:@\/]|%[0-9A-F]{2})*))\8)?|(\/?(?!\/)(?=((?:[a-z0-9-._~!$&'()*+,;=:@\/]|%[0-9A-F]{2})*))\10)?)(?:\?(?=((?:[a-z0-9-._~!$&'()*+,;=:@\/?]|%[0-9A-F]{2})*))\11)?(?:#(?=((?:[a-z0-9-._~!$&'()*+,;=:@\/?]|%[0-9A-F]{2})*))\12)?$/i", $uri ) ) { throw new \RuntimeException( "URI has not a valid format." ); }

Uriという名前のValueObject内でこの関数を単体テストし、UriTestでテストしました。

UriTest.php（URLとURNの両方の有効なケースと無効なケースが含まれています）

<?php declare( strict_types = 1 ); namespace XaviMontero\ThrasherPortage\Tests\Tour; use XaviMontero\ThrasherPortage\Tour\Uri; class UriTest extends \PHPUnit_Framework_TestCase { private $sut; public function testCreationIsOfProperClassWhenUriIsValid() { $sut = new Uri( 'http://example.com' ); $this->assertInstanceOf( 'XaviMontero\ThrasherPortage\Tour\Uri', $sut ); } /** * @dataProvider urlIsValidProvider * @dataProvider urnIsValidProvider */ public function testGetUriAsStringWhenUriIsValid( string $uri ) { $sut = new Uri( $uri ); $actual = $sut->getUriAsString(); $this->assertInternalType( 'string', $actual ); $this->assertEquals( $uri, $actual ); } public function urlIsValidProvider() { return [ [ 'http://example-server' ], [ 'http://example.com' ], [ 'http://example.com/' ], [ 'http://subdomain.example.com/path/?parameter1=value1&parameter2=value2' ], [ 'random-protocol://example.com' ], [ 'http://example.com:80' ], [ 'http://example.com?no-path-separator' ], [ 'http://example.com/pa%20th/' ], [ 'ftp://example.org/resource.txt' ], [ 'file://../../../relative/path/needs/protocol/resource.txt' ], [ 'http://example.com/#one-fragment' ], [ 'http://example.edu:8080#one-fragment' ], ]; } public function urnIsValidProvider() { return [ [ 'urn:isbn:0-486-27557-4' ], [ 'urn:example:mammal:monotreme:echidna' ], [ 'urn:mpeg:mpeg7:schema:2001' ], [ 'urn:uuid:6e8bc430-9c3a-11d9-9669-0800200c9a66' ], [ 'rare-urn:uuid:6e8bc430-9c3a-11d9-9669-0800200c9a66' ], [ 'urn:FOO:a123,456' ] ]; } /** * @dataProvider urlIsNotValidProvider * @dataProvider urnIsNotValidProvider */ public function testCreationThrowsExceptionWhenUriIsNotValid( string $uri ) { $this->expectException( 'RuntimeException' ); $this->sut = new Uri( $uri ); } public function urlIsNotValidProvider() { return [ [ 'only-text' ], [ 'http//missing.colon.example.com/path/?parameter1=value1&parameter2=value2' ], [ 'missing.protocol.example.com/path/' ], [ 'http://example.com\bad-separator' ], [ 'http://example.com|bad-separator' ], [ 'ht tp://example.com' ], [ 'http://exampl e.com' ], [ 'http://example.com/pa th/' ], [ '../../../relative/path/needs/protocol/resource.txt' ], [ 'http://example.com/#two-fragments#not-allowed' ], [ 'http://example.edu:portMustBeANumber#one-fragment' ], ]; } public function urnIsNotValidProvider() { return [ [ 'urn:mpeg:mpeg7:sch ema:2001' ], [ 'urn|mpeg:mpeg7:schema:2001' ], [ 'urn?mpeg:mpeg7:schema:2001' ], [ 'urn%mpeg:mpeg7:schema:2001' ], [ 'urn#mpeg:mpeg7:schema:2001' ], ]; } }

Uri.php（値オブジェクト）

<?php declare( strict_types = 1 ); namespace XaviMontero\ThrasherPortage\Tour; class Uri { /** @var string */ private $uri; public function __construct( string $uri ) { $this->assertUriIsCorrect( $uri ); $this->uri = $uri; } public function getUriAsString() { return $this->uri; } private function assertUriIsCorrect( string $uri ) { // https://stackoverflow.com/questions/30847/regex-to-validate-uris // http://snipplr.com/view/6889/regular-expressions-for-uri-validationparsing/ if( ! preg_match( "/^([a-z][a-z0-9+.-]*):(?:\/\/((?:(?=((?:[a-z0-9-._~!$&'()*+,;=:]|%[0-9A-F]{2})*))(\3)@)?(?=($$[0-9A-F:.]{2,}$$|(?:[a-z0-9-._~!$&'()*+,;=]|%[0-9A-F]{2})*))\5(?::(?=(\d*))\6)?)(\/(?=((?:[a-z0-9-._~!$&'()*+,;=:@\/]|%[0-9A-F]{2})*))\8)?|(\/?(?!\/)(?=((?:[a-z0-9-._~!$&'()*+,;=:@\/]|%[0-9A-F]{2})*))\10)?)(?:\?(?=((?:[a-z0-9-._~!$&'()*+,;=:@\/?]|%[0-9A-F]{2})*))\11)?(?:#(?=((?:[a-z0-9-._~!$&'()*+,;=:@\/?]|%[0-9A-F]{2})*))\12)?$/i", $uri ) ) { throw new \RuntimeException( "URI has not a valid format." ); } } }

ユニットテストの実行

46のテストで65のアサーションがあります。 注意：有効な式には2つのデータプロバイダーがあり、無効な式にはさらに2つのデータプロバイダーがあります。 1つはURL用で、もう1つはURN用です。 v5.6 *以前のPhpUnitのバージョンを使用している場合、2つのデータプロバイダーを1つのデータプロバイダーに結合する必要があります。

xavi@bromo:~/custom_www/hello-trip/mutant-migrant$ vendor/bin/phpunit PHPUnit 5.7.3 by Sebastian Bergmann and contributors. .............................................. 46 / 46 (100%) Time: 82 ms, Memory: 4.00MB OK (46 tests, 65 assertions)

コードカバレッジ

このサンプルURIチェッカーには、100％のコードカバレッジがあります。

Tim Groeneveld · Answer

OK、これは単純な正規表現よりも少し複雑ですが、さまざまな種類のURLを使用できます。

例：

有効としてマークされる必要があるすべて。

function is_valid_url($url) { // First check: is the url just a domain name? (allow a slash at the end) $_domain_regex = "|^[A-Za-z0-9-]+(\.[A-Za-z0-9-]+)*(\.[A-Za-z]{2,})/?$|"; if (preg_match($_domain_regex, $url)) { return true; } // Second: Check if it's a url with a scheme and all $_regex = '#^([a-z][\w-]+:(?:/{1,3}|[a-z0-9%])|www\d{0,3}[.]|[a-z0-9.\-]+[.][a-z]{2,4}/)(?:[^\s()<>]+|$([^\s()<>]+|(\([^\s()<>]+$))*\))$#'; if (preg_match($_regex, $url, $matches)) { // pull out the domain name, and make sure that the domain is valid. $_parts = parse_url($url); if (!in_array($_parts['scheme'], array( 'http', 'https' ))) return false; // Check the domain using the regex, stops domains like "-example.com" passing through if (!preg_match($_domain_regex, $_parts['Host'])) return false; // This domain looks pretty valid. Only way to check it now is to download it! return true; } return false; }

許可するプロトコルのin_arrayチェックがあることに注意してください（現在は、httpとhttpsのみがそのリストにあります）。

var_dump(is_valid_url('google.com')); // true var_dump(is_valid_url('google.com/')); // true var_dump(is_valid_url('http://google.com')); // true var_dump(is_valid_url('http://google.com/')); // true var_dump(is_valid_url('https://google.com')); // true

joedevon · Answer

ピーターの正規表現は、多くの理由で私には正しく見えません。ドメイン名にあらゆる種類の特殊文字を使用でき、多くのテストは行いません。

Frankieの関数は見た目が良く、関数が必要ない場合は、コンポーネントから適切な正規表現を作成できます。

^(http://|https://)(([a-z0-9]([-a-z0-9]*[a-z0-9]+)?){1,63}\.)+[a-z]{2,6}

テストされていませんが、私はそれが動作するはずだと思います。

また、Owenの答えも100％に見えません。正規表現のドメイン部分を取得し、正規表現テスターツールでテストしました http://erik.eae.net/playground/regexp/regexp.html

次の行を追加します。

(\S*?\.\S*?)

「regexp」セクションと次の行：

-hello.com

「サンプルテキスト」セクションの下。

その結果、マイナス文字が許可されました。\Sはスペース以外の文字を意味するためです。

Frankieの正規表現は、最初の文字にこの部分があるため、マイナスを処理します。

[a-z0-9]

マイナス記号やその他の特殊文字は許可されません。

Kitson88 · Answer

以下は、 RL Validation RegExを使用し、一般的なRBL（リアルタイムブラックホールリスト）サーバーに対してドメインを相互参照するための単純なクラスです。

インストール：

require 'URLValidation.php';

使用法：

require 'URLValidation.php'; $urlVal = new UrlValidation(); //Create Object Instance

domain()メソッドのパラメーターとしてURLを追加し、戻り値を確認します。

$urlArray = ['http://www.bokranzr.com/test.php?test=foo&test=dfdf', 'https://en-gb.facebook.com', 'https://www.google.com']; foreach ($urlArray as $k=>$v) { echo var_dump($urlVal->domain($v)) . ' URL: ' . $v . '<br>'; }

出力：

bool(false) URL: http://www.bokranzr.com/test.php?test=foo&test=dfdf bool(true) URL: https://en-gb.facebook.com bool(true) URL: https://www.google.com

上記からわかるように、www.bokranzr.comはRBLを介して悪意のあるWebサイトとしてリストされているため、ドメインはfalseとして返されました。

thespacecamel · Answer

WordPressで開発している場合は、

esc_url_raw($url) === $url

uRLを検証します（ esc_url_rawに関するWordPressのドキュメントはこちら）。 isunicodeおよびXSS-safeであるため、filter_var($url, FILTER_VALIDATE_URL)よりもURLをはるかによく処理します。（これはfilter_varのすべての問題について言及した良い記事です）。

Thomas Venturini · Answer

これが私がやった方法です。しかし、私は正規表現についてあまり敬遠していないことを明言したいと思います。しかし、それはあなたに役立つはずです:)

$pattern = "#((http|https)://(\S*?\.\S*?))(\s|\;|\)|\]|\[|\{|\}|,|”|\"|'|:|\<|$|\.\s)#i"; $text = preg_replace_callback($pattern,function($m){ return "<a href=\"$m[1]\" target=\"_blank\">$m[1]</a>$m[4]"; }, $text);

この方法では、パターンに評価マーカーを必要としません。

それが役に立てば幸い：）

Fred Vanelli · Answer

私のために働いた最高のURL正規表現：

function valid_URL($url){ return preg_match('%^(?:(?:https?|ftp)://)(?:\S+(?::\S*)?@|\d{1,3}(?:\.\d{1,3}){3}|(?:(?:[a-z\d\x{00a1}-\x{ffff}]+-?)*[a-z\d\x{00a1}-\x{ffff}]+)(?:\.(?:[a-z\d\x{00a1}-\x{ffff}]+-?)*[a-z\d\x{00a1}-\x{ffff}]+)*(?:\.[a-z\x{00a1}-\x{ffff}]{2,6}))(?::\d+)?(?:[^\s]*)?$%iu', $url); }

例：

valid_URL('https://Twitter.com'); // true valid_URL('http://Twitter.com'); // true valid_URL('http://Twitter.co'); // true valid_URL('http://t.co'); // true valid_URL('http://Twitter.c'); // false valid_URL('htt://Twitter.com'); // false valid_URL('http://example.com/?a=1&b=2&c=3'); // true valid_URL('http://127.0.0.1'); // true valid_URL(''); // false valid_URL(1); // false

ソース： http://urlregex.com/