PHPでpreg_split（）を使用する方法？

Question

Preg_split（）関数の使用方法を誰かに説明できますか？このようなパターンパラメータがわかりませんでした"/[\s,]+/"。

例えば：

私はこの主題を持っています：is is.そして私は結果が欲しいです：

array ( 0 => 'is', 1 => 'is', )

だから、スペースとフルストップを無視します、どうすればそれができますか？

Majenko · Accepted Answer

pregは Pcre REG「PCRE」は「Perl互換の正規表現」を意味するため、exp」は冗長です。

正規表現は初心者にとって悪夢です。私はまだそれらを完全に理解しておらず、何年も彼らと一緒に働いてきました。

基本的にあなたがそこに持っている例は、分解されます：

"/[\s,]+/" / = start or end of pattern string [ ... ] = grouping of characters + = one or more of the preceeding character or group \s = Any whitespace character (space, tab). , = the literal comma character

したがって、「少なくとも1つの空白文字または1つ以上のカンマである文字列の任意の部分で分割される」という検索パターンがあります。

その他の一般的な文字は次のとおりです。

. = any single character * = any number of the preceeding character or group ^ (at start of pattern) = The start of the string $ (at end of pattern) = The end of the string ^ (inside [...]) = "NOT" the following character

PHPについては公式ドキュメントに良い情報があります。

JakeGould · Answer

これはうまくいくはずです：

_$words = preg_split("/(?<=\w)\b\s*[!?.]*/", 'is is.', -1, PREG_SPLIT_NO_EMPTY); echo '<pre>'; print_r($words); echo '</pre>'; _

出力は次のようになります。

_Array ( [0] => is [1] => is ) _

正規表現を説明する前に、_PREG_SPLIT_NO_EMPTY_について説明します。つまり、結果が空でない場合にのみ _preg_split_ の結果を返すことを意味します。これにより、配列_$words_に返されるデータには、正規表現パターンや混合データソースを処理するときに発生する可能性がある空の値だけでなく、本当にデータが含まれていることが保証されます。

そして、その正規表現の説明は this tool を使用してこのように分解できます：

_NODE EXPLANATION -------------------------------------------------------------------------------- (?<= look behind to see if there is: -------------------------------------------------------------------------------- \w Word characters (a-z, A-Z, 0-9, _) -------------------------------------------------------------------------------- ) end of look-behind -------------------------------------------------------------------------------- \b the boundary between a Word char (\w) and something that is not a Word char -------------------------------------------------------------------------------- \s* whitespace (
, 
, 	, \f, and " ") (0 or more times (matching the most amount possible)) -------------------------------------------------------------------------------- [!?.]* any character of: '!', '?', '.' (0 or more times (matching the most amount possible)) _

/(?<=\w)\b\s*[!?.]*/の完全な正規表現パターンを this other other tool に入力すると、より適切な説明が見つかります。

_(?<=\w)_肯定後読み-以下の正規表現が一致することをアサートします
_\w_任意のWord文字に一致_[a-zA-Z0-9_]_
_\b_ワード境界の位置をアサート_(^\w|\w$|\W\w|\w\W)_
_\s*_任意の空白文字に一致_[ \f ]_
Quantifier：ゼロから無制限の回数の間で、可能な限り多くの回数、必要に応じて還元します[貪欲]
_!?._リスト内の単一の文字_!?._文字通り

最後の正規表現の説明は、次のように人間（私とも呼ばれます）によって要約できます。

複数のスペースと_!?._の句読点を含むことができるWord境界の前にある任意のWord文字に一致して分割します。

Federico Piazza · Answer

ドキュメントは言う：

Preg_split（）関数は、正規表現がパターンの入力パラメーターとして受け入れられることを除いて、split（）とまったく同じように動作します。

したがって、次のコード...

<?php $ip = "123 ,456 ,789 ,000"; $iparr = preg_split ("/[\s,]+/", $ip); print "$iparr[0] <br />"; print "$iparr[1] <br />" ; print "$iparr[2] <br />" ; print "$iparr[3] <br />" ; ?>

これにより、次の結果が生成されます。

123 456 789 000

したがって、この件名がis isであり、必要な場合：array（0 => 'is'、1 => 'is'、）

正規表現を"/[\s]+/"に変更する必要があります

is ,isがない限り、すでに持っている正規表現が必要です"/[\s,]+/"

ceejayoz · Answer

PHPの _str_Word_count_ は、ここでより良い選択かもしれません。

str_Word_count($string, 2)は、重複を含む文字列内のすべての単語の配列を出力します。