重複した識別子を持つ列を広げる方法は？

Question

Aには次のティブルがあります。

_structure(list(age = c("21", "17", "32", "29", "15"), gender = structure(c(2L, 1L, 1L, 2L, 2L), .Label = c("Female", "Male"), class = "factor")), row.names = c(NA, -5L), class = c("tbl_df", "tbl", "data.frame"), .Names = c("age", "gender")) age gender <chr> <fctr> 1 21 Male 2 17 Female 3 32 Female 4 29 Male 5 15 Male _

そして、私は_tidyr::spread_を使用してこれを達成しようとしています：

_ Female Male 1 NA 21 2 17 NA 3 32 NA 4 NA 29 5 NA 15 _

spread(gender, age)は機能すると思っていましたが、次のようなエラーメッセージが表示されます。

_Error: Duplicate identifiers for rows (2, 3), (1, 4, 5) _

alistaire · Accepted Answer

現在、ageには2つのFemale値があり、Maleには3つの値があり、他の変数はspreadのように1行に折りたたまれないようにします。インデックス値が似ている/ないインデックス値の処理を試みます：

_library(tidyverse) df <- data_frame(x = c('a', 'b'), y = 1:2) df # 2 rows... #> # A tibble: 2 x 2 #> x y #> <chr> <int> #> 1 a 1 #> 2 b 2 df %>% spread(x, y) # ...become one if there's only one value for each. #> # A tibble: 1 x 2 #> a b #> * <int> <int> #> 1 1 2 _

spreadは、複数の値を結合する関数を適用しません（dcast）。したがって、行にインデックスを付ける必要があるため、場所に1つまたはゼロの値があります。

_df <- data_frame(i = c(1, 1, 2, 2, 3, 3), x = c('a', 'b', 'a', 'b', 'a', 'b'), y = 1:6) df # the two rows with each `i` value here... #> # A tibble: 6 x 3 #> i x y #> <dbl> <chr> <int> #> 1 1 a 1 #> 2 1 b 2 #> 3 2 a 3 #> 4 2 b 4 #> 5 3 a 5 #> 6 3 b 6 df %>% spread(x, y) # ...become one row here. #> # A tibble: 3 x 3 #> i a b #> * <dbl> <int> <int> #> 1 1 1 2 #> 2 2 3 4 #> 3 3 5 6 _

値が他の列によって自然にインデックス付けされていない場合、一意のインデックス列を追加して（たとえば、行番号を列として追加することにより）、spreadが行を折りたたむのを防ぎます：

_df <- structure(list(age = c("21", "17", "32", "29", "15"), gender = structure(c(2L, 1L, 1L, 2L, 2L), .Label = c("Female", "Male"), class = "factor")), row.names = c(NA, -5L), class = c("tbl_df", "tbl", "data.frame"), .Names = c("age", "gender")) df %>% mutate(i = row_number()) %>% spread(gender, age) #> # A tibble: 5 x 3 #> i Female Male #> * <int> <chr> <chr> #> 1 1 <NA> 21 #> 2 2 17 <NA> #> 3 3 32 <NA> #> 4 4 <NA> 29 #> 5 5 <NA> 15 _

後で削除する場合は、select(-i)を追加します。この場合、非常に有用なdata.frameは生成されませんが、より複雑な再形成の最中に非常に役立つ可能性があります。