データフレーム文字列列を複数の列に分割

Question

フォームのデータを受け取りたい

before = data.frame(attr = c(1,30,4,6), type=c('foo_and_bar','foo_and_bar_2')) attr type 1 1 foo_and_bar 2 30 foo_and_bar_2 3 4 foo_and_bar 4 6 foo_and_bar_2

次のようにするには、上から "type"列にsplit()を使用します。

 attr type_1 type_2 1 1 foo bar 2 30 foo bar_2 3 4 foo bar 4 6 foo bar_2

なんらかの形のapplyを含む、信じられないほど複雑なものを思いついたのですが、それを見逃していました。それは最善の方法であるにははるかに複雑すぎるようでした。以下のようにstrsplitを使用できますが、それをデータフレームの2つの列に戻す方法はわかりません。

> strsplit(as.character(before$type),'_and_') [[1]] [1] "foo" "bar" [[2]] [1] "foo" "bar_2" [[3]] [1] "foo" "bar" [[4]] [1] "foo" "bar_2"

ポインタをありがとう。私はまだRのリストをあまりよく調べていません。

hadley · Accepted Answer

stringr::str_split_fixedを使う

library(stringr) str_split_fixed(before$type, "_and_", 2)

hadley · Answer

別の選択肢は新しいtidyrパッケージを使うことです。

library(dplyr) library(tidyr) before <- data.frame( attr = c(1, 30 ,4 ,6 ), type = c('foo_and_bar', 'foo_and_bar_2') ) before %>% separate(type, c("foo", "bar"), "_and_") ## attr foo bar ## 1 1 foo bar ## 2 30 foo bar_2 ## 3 4 foo bar ## 4 6 foo bar_2

David Arenburg · Answer

5年後に必須のdata.tableソリューションを追加

library(data.table) ## v 1.9.6+ setDT(before)[, paste0("type", 1:2) := tstrsplit(type, "_and_")] before # attr type type1 type2 # 1: 1 foo_and_bar foo bar # 2: 30 foo_and_bar_2 foo bar_2 # 3: 4 foo_and_bar foo bar # 4: 6 foo_and_bar_2 foo bar_2

また、type.convertとfixed引数を追加することで、結果の列が正しい型とを持つようにしてパフォーマンスを向上させることもできます（"_and_"は実際には正規表現ではないため）。）

setDT(before)[, paste0("type", 1:2) := tstrsplit(type, "_and_", type.convert = TRUE, fixed = TRUE)]

Aniko · Answer

さらに別のアプローチ：rbindにoutを使用する：

before <- data.frame(attr = c(1,30,4,6), type=c('foo_and_bar','foo_and_bar_2')) out <- strsplit(as.character(before$type),'_and_') do.call(rbind, out) [,1] [,2] [1,] "foo" "bar" [2,] "foo" "bar_2" [3,] "foo" "bar" [4,] "foo" "bar_2"

そして組み合わせるために：

data.frame(before$attr, do.call(rbind, out))

42- · Answer

"["を使ったsapplyはそれらのリストの最初か2番目の項目を抽出するのに使えることに注意してください：

before$type_1 <- sapply(strsplit(as.character(before$type),'_and_'), "[", 1) before$type_2 <- sapply(strsplit(as.character(before$type),'_and_'), "[", 2) before$type <- NULL

これがgsubメソッドです。

before$type_1 <- gsub("_and_.+$", "", before$type) before$type_2 <- gsub("^.+_and_", "", before$type) before$type <- NULL

Ramnath · Answer

これはanikoのソリューションと同じ行に沿った1つのライナーですが、hadleyのstringrパッケージを使用します。

do.call(rbind, str_split(before$type, '_and_'))

A5C1D2H2I1M1N2O1R2T1 · Answer

オプションを追加するために、私のsplitstackshape::cSplit関数をこのように使うこともできます。

library(splitstackshape) cSplit(before, "type", "_and_") # attr type_1 type_2 # 1: 1 foo bar # 2: 30 foo bar_2 # 3: 4 foo bar # 4: 6 foo bar_2

Gavin Simpson · Answer

簡単な方法はsapply()と[関数を使うことです：

before <- data.frame(attr = c(1,30,4,6), type=c('foo_and_bar','foo_and_bar_2')) out <- strsplit(as.character(before$type),'_and_')

例えば：

> data.frame(t(sapply(out, `[`))) X1 X2 1 foo bar 2 foo bar_2 3 foo bar 4 foo bar_2

sapply()の結果は行列で、転置してデータフレームにキャストバックする必要があります。それはあなたが望んだ結果を生み出すためのいくつかの簡単な操作です。

after <- with(before, data.frame(attr = attr)) after <- cbind(after, data.frame(t(sapply(out, `[`)))) names(after)[2:3] <- paste("type", 1:2, sep = "_")

この時点で、afterはあなたが望んでいたものです。

> after attr type_1 type_2 1 1 foo bar 2 30 foo bar_2 3 4 foo bar 4 6 foo bar_2

lmo · Answer

これは以前のいくつかの解決策と重なるが、適切な名前のdata.frameを返すbase R oneライナーです。

out <- setNames(data.frame(before$attr, do.call(rbind, strsplit(as.character(before$type), split="_and_"))), c("attr", paste0("type_", 1:2))) out attr type_1 type_2 1 1 foo bar 2 30 foo bar_2 3 4 foo bar 4 6 foo bar_2

変数を分割するためにstrsplitを使用し、データをdata.frameに戻すためにdata.frame/rbindと共にdo.callを使用します。追加の漸進的な改善は、data.frameに変数名を追加するためのsetNamesの使用です。

Yannis P. · Answer

話題はほぼ疲弊していますが、出力列の数が分からない、もう少し一般的なバージョンの解決策を先験的に提供したいと思います。だから例えばあなたが持っている

before = data.frame(attr = c(1,30,4,6), type=c('foo_and_bar','foo_and_bar_2', 'foo_and_bar_2_and_bar_3', 'foo_and_bar')) attr type 1 1 foo_and_bar 2 30 foo_and_bar_2 3 4 foo_and_bar_2_and_bar_3 4 6 foo_and_bar

分割前の結果列の数がわからないため、dplyr separate()を使用できません。そのため、生成された列のパターンと名前の接頭辞を指定して、列を分割するためにstringrを使用する関数を作成しました。使用したコーディングパターンが正しいことを願います。

split_into_multiple <- function(column, pattern = ", ", into_prefix){ cols <- str_split_fixed(column, pattern, n = Inf) # Sub out the ""'s returned by filling the matrix to the right, with NAs which are useful cols[which(cols == "")] <- NA cols <- as.tibble(cols) # name the 'cols' tibble as 'into_prefix_1', 'into_prefix_2', ..., 'into_prefix_m' # where m = # columns of 'cols' m <- dim(cols)[2] names(cols) <- paste(into_prefix, 1:m, sep = "_") return(cols) }

次のようにdplyrパイプでsplit_into_multipleを使うことができます。

after <- before %>% bind_cols(split_into_multiple(.$type, "_and_", "type")) %>% # selecting those that start with 'type_' will remove the original 'type' column select(attr, starts_with("type_")) >after attr type_1 type_2 type_3 1 1 foo bar <NA> 2 30 foo bar_2 <NA> 3 4 foo bar_2 bar_3 4 6 foo bar <NA>

そして、gatherを使って整理することができます。

after %>% gather(key, val, -attr, na.rm = T) attr key val 1 1 type_1 foo 2 30 type_1 foo 3 4 type_1 foo 4 6 type_1 foo 5 1 type_2 bar 6 30 type_2 bar_2 7 4 type_2 bar_2 8 6 type_2 bar 11 4 type_3 bar_3

Swifty McSwifterton · Answer

この質問はかなり古くなっていますが、私が見つけた解決策を現時点で最も単純なものにします。

library(reshape2) before = data.frame(attr = c(1,30,4,6), type=c('foo_and_bar','foo_and_bar_2')) newColNames <- c("type1", "type2") newCols <- colsplit(before$type, "_and_", newColNames) after <- cbind(before, newCols) after$type <- NULL after

ashaw · Answer

strsplit()を使い続けたい場合のもう1つの方法は、unlist()コマンドを使用することです。これはそれらの線に沿った解決策です。

tmp <- matrix(unlist(strsplit(as.character(before$type), '_and_')), ncol=2, byrow=TRUE) after <- cbind(before$attr, as.data.frame(tmp)) names(after) <- c("attr", "type_1", "type_2")

Rich Scriven · Answer

Rバージョン3.4.0以降、tilsパッケージ（ベースRインストールに含まれています）からのstrcapture()を使用して、出力を他の列にバインドすることができます。

out <- strcapture( "(.*)_and_(.*)", as.character(before$type), data.frame(type_1 = character(), type_2 = character()) ) cbind(before["attr"], out) # attr type_1 type_2 # 1 1 foo bar # 2 30 foo bar_2 # 3 4 foo bar # 4 6 foo bar_2

Joe · Answer

基本だがおそらく遅い：

n <- 1 for(i in strsplit(as.character(before$type),'_and_')){ before[n, 'type_1'] <- i[[1]] before[n, 'type_2'] <- i[[2]] n <- n + 1 } ## attr type type_1 type_2 ## 1 1 foo_and_bar foo bar ## 2 30 foo_and_bar_2 foo bar_2 ## 3 4 foo_and_bar foo bar ## 4 6 foo_and_bar_2 foo bar_2