ネストされたif elseステートメント

Question

私はまだSASコードをRに変換する方法を学んでおり、警告が表示されます。どこで間違いを犯しているのかを理解する必要があります。私がしたいのは、人口の3つのステータスを要約し、区別する変数を作成することです：本土、海外、外国人。私は2つの変数を持つデータベースを持っています：

id国籍：idnat（フランス語、外国人）、

idnatがフランス語の場合：

id出身地：idbp（本土、植民地、海外）

idnatおよびidbpからの情報をidnat2という新しい変数に要約します。

ステータス：k（本土、海外、外国人）

これらの変数はすべて「文字タイプ」を使用します。

列idnat2に期待される結果：

 idnat idbp idnat2 1 french mainland mainland 2 french colony overseas 3 french overseas overseas 4 foreign foreign foreign

Rで翻訳したいSASコードは次のとおりです。

if idnat = "french" then do; if idbp in ("overseas","colony") then idnat2 = "overseas"; else idnat2 = "mainland"; end; else idnat2 = "foreigner"; run;

Rでの私の試みは次のとおりです。

if(idnat=="french"){ idnat2 <- "mainland" } else if(idbp=="overseas"|idbp=="colony"){ idnat2 <- "overseas" } else { idnat2 <- "foreigner" }

この警告が表示されます。

Warning message: In if (idnat=="french") { : the condition has length > 1 and only the first element will be used

簡単にするために、代わりに「ネストされたifelse」を使用することをお勧めしましたが、さらに警告が表示されます。

idnat2 <- ifelse (idnat=="french", "mainland", ifelse (idbp=="overseas"|idbp=="colony", "overseas") ) else (idnat2 <- "foreigner")

警告メッセージによると、長さは1より大きいため、最初の括弧の間にあるものだけが考慮されます。申し訳ありませんが、この長さがここで何に関係するのかわかりませんか？誰が私が間違っているのか知っていますか？

Tomas Greif · Answer

スプレッドシートアプリケーションを使用している場合、次の構文を持つ基本関数if()があります。

if(<condition>, <yes>, <no>)

Rのifelse()の構文はまったく同じです。

ifelse(<condition>, <yes>, <no>)

スプレッドシートアプリケーションのif()との唯一の違いは、R ifelse()がベクトル化されることです（入力としてベクトルを取り、出力でベクトルを返す）。 a> bの場合に比較し、はいの場合は1を返し、そうでない場合は0を返す例として、スプレッドシートアプリケーションとRの式の次の比較を検討してください。

スプレッドシートで：

 A B C 1 3 1 =if(A1 > B1, 1, 0) 2 2 2 =if(A2 > B2, 1, 0) 3 1 3 =if(A3 > B3, 1, 0)

Rで：

> a <- 3:1; b <- 1:3 > ifelse(a > b, 1, 0) [1] 1 0 0

ifelse()は、さまざまな方法でネストできます。

ifelse(<condition>, <yes>, ifelse(<condition>, <yes>, <no>)) ifelse(<condition>, ifelse(<condition>, <yes>, <no>), <no>) ifelse(<condition>, ifelse(<condition>, <yes>, <no>), ifelse(<condition>, <yes>, <no>) ) ifelse(<condition>, <yes>, ifelse(<condition>, <yes>, ifelse(<condition>, <yes>, <no>) ) )

列idnat2を計算するには：

df <- read.table(header=TRUE, text=" idnat idbp idnat2 french mainland mainland french colony overseas french overseas overseas foreign foreign foreign" ) with(df, ifelse(idnat=="french", ifelse(idbp %in% c("overseas","colony"),"overseas","mainland"),"foreign") )

Rドキュメント

the condition has length > 1 and only the first element will be usedとは何ですか？どれどれ：

> # What is first condition really testing? > with(df, idnat=="french") [1] TRUE TRUE TRUE FALSE > # This is result of vectorized function - equality of all elements in idnat and > # string "french" is tested. > # Vector of logical values is returned (has the same length as idnat) > df$idnat2 <- with(df, + if(idnat=="french"){ + idnat2 <- "xxx" + } + ) Warning message: In if (idnat == "french") { : the condition has length > 1 and only the first element will be used > # Note that the first element of comparison is TRUE and that's whay we get: > df idnat idbp idnat2 1 french mainland xxx 2 french colony xxx 3 french overseas xxx 4 foreign foreign xxx > # There is really logic in it, you have to get used to it

if()を引き続き使用できますか？はい、できますが、構文はそれほどクールではありません:)

test <- function(x) { if(x=="french") { "french" } else{ "not really french" } } apply(array(df[["idnat"]]),MARGIN=1, FUN=test)

SQLに精通している場合は、CASE package でsqldf statement を使用することもできます。

Thomas · Answer

次のようなものを試してください：

# some sample data idnat <- sample(c("french","foreigner"),100,TRUE) idbp <- rep(NA,100) idbp[idnat=="french"] <- sample(c("mainland","overseas","colony"),sum(idnat=="french"),TRUE) # recoding out <- ifelse(idnat=="french" & !idbp %in% c("overseas","colony"), "mainland", ifelse(idbp %in% c("overseas","colony"),"overseas", "foreigner")) cbind(idnat,idbp,out) # check result

あなたの混乱は、SASとRがif-else構文をどのように扱うかから来ています。 Rでは、ifおよびelseはベクトル化されません。つまり、1つの条件が真（つまり、if("french"=="french")が機能する）であり、複数の論理（つまり、if(c("french","foreigner")=="french")動作しません）、Rは受け取った警告を表示します。

対照的に、ifelseはベクトル化されているため、ベクトル（別名入力変数）を取得し、SASで慣れているように、各要素の論理条件をテストできます。これを回避する別の方法は、（ここで始めたように）ifおよびelseステートメントを使用してループを構築することですが、ベクトル化されたifelseアプローチはより多くなります効率的で、一般的にコードが少なくなります。

Uwe · Answer

データセットに多くの行が含まれる場合、ネストされたifelse()の代わりにdata.tableを使用してルックアップテーブルと結合する方が効率的です。

以下のルックアップテーブルを提供

lookup

 idnat idbp idnat2 1: french mainland mainland 2: french colony overseas 3: french overseas overseas 4: foreign foreign foreign

およびサンプルデータセット

library(data.table) n_row <- 10L set.seed(1L) DT <- data.table(idnat = "french", idbp = sample(c("mainland", "colony", "overseas", "foreign"), n_row, replace = TRUE)) DT[idbp == "foreign", idnat := "foreign"][]

 idnat idbp 1: french colony 2: french colony 3: french overseas 4: foreign foreign 5: french mainland 6: foreign foreign 7: foreign foreign 8: french overseas 9: french overseas 10: french mainland

次に、参加中の更新を実行できます。

DT[lookup, on = .(idnat, idbp), idnat2 := i.idnat2][]

 idnat idbp idnat2 1: french colony overseas 2: french colony overseas 3: french overseas overseas 4: foreign foreign foreign 5: french mainland mainland 6: foreign foreign foreign 7: foreign foreign foreign 8: french overseas overseas 9: french overseas overseas 10: french mainland mainland

Sven Hohenstein · Answer

ifおよびifelseなしのベクターidnat2を作成できます。

関数replaceを使用して、すべての"colony"を"overseas"に置き換えることができます。

idnat2 <- replace(idbp, idbp == "colony", "overseas")

mpalanco · Answer

DplyrおよびsqldfパッケージでのSQL CASEステートメントの使用：

データ

df <-structure(list(idnat = structure(c(2L, 2L, 2L, 1L), .Label = c("foreign", "french"), class = "factor"), idbp = structure(c(3L, 1L, 4L, 2L), .Label = c("colony", "foreign", "mainland", "overseas"), class = "factor")), .Names = c("idnat", "idbp"), class = "data.frame", row.names = c(NA, -4L))

sqldf

library(sqldf) sqldf("SELECT idnat, idbp, CASE WHEN idbp IN ('colony', 'overseas') THEN 'overseas' ELSE idbp END AS idnat2 FROM df")

dplyr

library(dplyr) df %>% mutate(idnat2 = case_when(.$idbp == 'mainland' ~ "mainland", .$idbp %in% c("colony", "overseas") ~ "overseas", TRUE ~ "foreign"))

出力

 idnat idbp idnat2 1 french mainland mainland 2 french colony overseas 3 french overseas overseas 4 foreign foreign foreign

Sun Bee · Answer

Data.tableを使用したソリューションは次のとおりです。

DT[, idnat2 := ifelse(idbp %in% "foreign", "foreign", ifelse(idbp %in% c("colony", "overseas"), "overseas", "mainland" ))]

ifelseはベクトル化されます。 if-elseはそうではありません。ここで、DTは次のとおりです。

 idnat idbp 1 french mainland 2 french colony 3 french overseas 4 foreign foreign

これは与える：

 idnat idbp idnat2 1: french mainland mainland 2: french colony overseas 3: french overseas overseas 4: foreign foreign foreign

Azul · Answer

# Read in the data. idnat=c("french","french","french","foreign") idbp=c("mainland","colony","overseas","foreign") # Initialize the new variable. idnat2=as.character(vector()) # Logically evaluate "idnat" and "idbp" for each case, assigning the appropriate level to "idnat2". for(i in 1:length(idnat)) { if(idnat[i] == "french" & idbp[i] == "mainland") { idnat2[i] = "mainland" } else if (idnat[i] == "french" & (idbp[i] == "colony" | idbp[i] == "overseas")) { idnat2[i] = "overseas" } else { idnat2[i] = "foreign" } } # Create a data frame with the two old variables and the new variable. data.frame(idnat,idbp,idnat2)