r-複数の文字列を検索するためのgreplが存在します

Question

grepl("instance|percentage", labelTest$Text)

instanceまたはpercentageのいずれかが存在する場合、trueを返します。

両方の用語が存在する場合にのみ、どのようにしてtrueになりますか？.

AkselA · Accepted Answer

Text <- c("instance", "percentage", "n", "instance percentage", "percentage instance") grepl("instance|percentage", Text) # TRUE TRUE FALSE TRUE TRUE grepl("instance.*percentage|percentage.*instance", Text) # FALSE FALSE FALSE TRUE TRUE

後者は以下を探すことで機能します。

('instance')(any character sequence)('percentage') OR ('percentage')(any character sequence)('instance')

当然、3つ以上の単語の組み合わせを見つける必要がある場合、これはかなり複雑になります。その後、コメントで言及されたソリューションは、実装と読み取りがより簡単になります。

多くの単語に一致する場合に関連する可能性がある別の代替手段は、肯定的な先読みを使用することです（「非消費型」の一致と考えることができます）。このためには、Perl正規表現をアクティブにする必要があります。

# create a vector of Word combinations set.seed(1) words <- c("instance", "percentage", "element", "character", "n", "o", "p") Text2 <- replicate(10, paste(sample(words, 5), collapse=" ")) # grepl with multiple positive look-ahead longperl <- grepl("(?=.*instance)(?=.*percentage)(?=.*element)(?=.*character)", Text2, Perl=TRUE) # this is equivalent to the solution proposed in the comments longstrd <- grepl("instance", Text2) & grepl("percentage", Text2) & grepl("element", Text2) & grepl("character", Text2) # they produce identical results identical(longperl, longstrd)

さらに、パターンがベクトルに格納されている場合は、式を大幅に圧縮して、

pat <- c("instance", "percentage", "element", "character") longperl <- grepl(paste0("(?=.*", pat, ")", collapse=""), Text2, Perl=TRUE) longstrd <- rowSums(sapply(pat, grepl, Text2) - 1L) == 0L

コメントで求められているように、完全に一致する単語、つまり部分文字列を一致させない場合は、\bを使用して単語の境界を指定できます。例えば：

tx <- c("cent element", "percentage element", "element cent", "element centimetre") grepl("(?=.*\bcent\b)(?=.*element)", tx, Perl=TRUE) # TRUE FALSE TRUE FALSE grepl("element", tx) & grepl("\bcent\b", tx) # TRUE FALSE TRUE FALSE

Sebastian Geschonke · Answer

これは、ベクター「labelTest $ Text」の項目で両方の用語が発生する場合に「TRUE」のみを取得する方法です。これは質問に対する正確な答えであり、他の解決策よりもはるかに短いと思います。

grepl("instance",labelTest$Text) & grepl("percentage",labelTest$Text)

niels kristian schmidt · Answer

intersectを使用して、Wordごとにgrepをフィードします。

library(data.table) #used for subsetting text vector below vector_of_text[ intersect( grep(vector_of_text , pattern = "pattern1"), grep(vector_of_text , pattern = "pattern2") ) ]