web-dev-qa-db-ja.com

単一の関数からの複数の戻り値を持つdplyr summarise()

複数の値を返すsummarisedplyr 0.1.2)を持つ関数(たとえば、describeパッケージのpsych関数)を使用する方法があるかどうか疑問に思っています。

そうでない場合、それはまだ実装されていないという理由だけですか、それとも良いアイデアではない理由がありますか?

例:

require(psych)
require(ggplot2)
require(dplyr)

dgrp <- group_by(diamonds, cut)
describe(dgrp$price)
summarise(dgrp, describe(price))

生成するもの:Error: expecting a single value

33
jzadra

dplyr> = 0.2では、これにdo関数を使用できます。

library(ggplot2)
library(psych)
library(dplyr)
diamonds %>%
    group_by(cut) %>%
    do(describe(.$price)) %>%
    select(-vars)
#> Source: local data frame [5 x 13]
#> Groups: cut [5]
#> 
#>         cut     n     mean       sd median  trimmed      mad   min   max range     skew kurtosis       se
#>      (fctr) (dbl)    (dbl)    (dbl)  (dbl)    (dbl)    (dbl) (dbl) (dbl) (dbl)    (dbl)    (dbl)    (dbl)
#> 1      Fair  1610 4358.758 3560.387 3282.0 3695.648 2183.128   337 18574 18237 1.780213 3.067175 88.73281
#> 2      Good  4906 3928.864 3681.590 3050.5 3251.506 2853.264   327 18788 18461 1.721943 3.042550 52.56197
#> 3 Very Good 12082 3981.760 3935.862 2648.0 3243.217 2855.488   336 18818 18482 1.595341 2.235873 35.80721
#> 4   Premium 13791 4584.258 4349.205 3185.0 3822.231 3371.432   326 18823 18497 1.333358 1.072295 37.03497
#> 5     Ideal 21551 3457.542 3808.401 1810.0 2656.136 1630.860   326 18806 18480 1.835587 2.977425 25.94233

purrrパッケージに基づくソリューション:

library(ggplot2)
library(psych)
library(purrr)
diamonds %>% 
    slice_rows("cut") %>% 
    by_slice(~ describe(.x$price), .collate = "rows")
#> Source: local data frame [5 x 14]
#> 
#>         cut  vars     n     mean       sd median  trimmed      mad   min   max range     skew kurtosis       se
#>      (fctr) (dbl) (dbl)    (dbl)    (dbl)  (dbl)    (dbl)    (dbl) (dbl) (dbl) (dbl)    (dbl)    (dbl)    (dbl)
#> 1      Fair     1  1610 4358.758 3560.387 3282.0 3695.648 2183.128   337 18574 18237 1.780213 3.067175 88.73281
#> 2      Good     1  4906 3928.864 3681.590 3050.5 3251.506 2853.264   327 18788 18461 1.721943 3.042550 52.56197
#> 3 Very Good     1 12082 3981.760 3935.862 2648.0 3243.217 2855.488   336 18818 18482 1.595341 2.235873 35.80721
#> 4   Premium     1 13791 4584.258 4349.205 3185.0 3822.231 3371.432   326 18823 18497 1.333358 1.072295 37.03497
#> 5     Ideal     1 21551 3457.542 3808.401 1810.0 2656.136 1630.860   326 18806 18480 1.835587 2.977425 25.94233

しかし、data.table

as.data.table(diamonds)[, describe(price), by = cut]
#>          cut vars     n     mean       sd median  trimmed      mad min   max range     skew kurtosis       se
#> 1:     Ideal    1 21551 3457.542 3808.401 1810.0 2656.136 1630.860 326 18806 18480 1.835587 2.977425 25.94233
#> 2:   Premium    1 13791 4584.258 4349.205 3185.0 3822.231 3371.432 326 18823 18497 1.333358 1.072295 37.03497
#> 3:      Good    1  4906 3928.864 3681.590 3050.5 3251.506 2853.264 327 18788 18461 1.721943 3.042550 52.56197
#> 4: Very Good    1 12082 3981.760 3935.862 2648.0 3243.217 2855.488 336 18818 18482 1.595341 2.235873 35.80721
#> 5:      Fair    1  1610 4358.758 3560.387 3282.0 3695.648 2183.128 337 18574 18237 1.780213 3.067175 88.73281

リストを返す独自のサマリー関数を作成できます。

fun <- function(x) {
    list(n = length(x),
         min = min(x),
         median = as.numeric(median(x)),
         mean = mean(x),
         sd = sd(x),
         max = max(x))
}
as.data.table(diamonds)[, fun(price), by = cut]
#>          cut     n min median     mean       sd   max
#> 1:     Ideal 21551 326 1810.0 3457.542 3808.401 18806
#> 2:   Premium 13791 326 3185.0 4584.258 4349.205 18823
#> 3:      Good  4906 327 3050.5 3928.864 3681.590 18788
#> 4: Very Good 12082 336 2648.0 3981.760 3935.862 18818
#> 5:      Fair  1610 337 3282.0 4358.758 3560.387 18574
39
Artem Klevtsov