Python pandas groupbyを複数の列に集約し、ピボット

Question

Pythonには、次のようなpandas DataFrameがあります。

Item | shop1 | shop2 | shop3 | Category ------------------------------------ Shoes| 45 | 50 | 53 | Clothes TV | 200 | 300 | 250 | Technology Book | 20 | 17 | 21 | Books phone| 300 | 350 | 400 | Technology

ここで、shop1、shop2、およびshop3は、異なるショップのすべてのアイテムのコストです。ここで、次のようにデータをクリーニングした後、DataFrameを返す必要があります。

Category (index)| size| sum| mean | std ----------------------------------------

サイズは各カテゴリのアイテム数であり、合計、平均、標準は3つのショップに適用される同じ機能に関連しています。 split-apply-combineパターン（groupby、aggregate、apply、...）でこれらの操作を行うにはどうすればよいですか？

誰か助けてくれますか？私はこれに夢中になります...ありがとう！

piRSquared · Accepted Answer

集約によるグループ内の辞書の使用の廃止を考慮して、Pandas 0.22+向けに編集。

辞書のキーを使用して関数を指定し、辞書自体を使用して列の名前を変更する、非常に類似した辞書を設定します。

rnm_cols = dict(size='Size', sum='Sum', mean='Mean', std='Std') df.set_index(['Category', 'Item']).stack().groupby('Category') \ .agg(rnm_cols.keys()).rename(columns=rnm_cols) Size Sum Mean Std Category Books 3 58 19.333333 2.081666 Clothes 3 148 49.333333 4.041452 Technology 6 1800 300.000000 70.710678

オプション1
se agg ←ドキュメントへのリンク

agg_funcs = dict(Size='size', Sum='sum', Mean='mean', Std='std') df.set_index(['Category', 'Item']).stack().groupby(level=0).agg(agg_funcs) Std Sum Mean Size Category Books 2.081666 58 19.333333 3 Clothes 4.041452 148 49.333333 3 Technology 70.710678 1800 300.000000 6

オプション2
lessの方が多い
se describe ←ドキュメントへのリンク

df.set_index(['Category', 'Item']).stack().groupby(level=0).describe().unstack() count mean std min 25% 50% 75% max Category Books 3.0 19.333333 2.081666 17.0 18.5 20.0 20.5 21.0 Clothes 3.0 49.333333 4.041452 45.0 47.5 50.0 51.5 53.0 Technology 6.0 300.000000 70.710678 200.0 262.5 300.0 337.5 400.0

Scott Boston · Answer

df.groupby('Category').agg({'Item':'size','shop1':['sum','mean','std'],'shop2':['sum','mean','std'],'shop3':['sum','mean','std']})

または、すべてのショップで使用する場合：

df1 = df.set_index(['Item','Category']).stack().reset_index().rename(columns={'level_2':'Shops',0:'costs'}) df1.groupby('Category').agg({'Item':'size','costs':['sum','mean','std']})

foglerit · Answer

私が正しく理解していれば、個々のショップではなく、すべてのショップの集約メトリックを計算したいでしょう。これを行うには、最初に stack を使用してデータフレームを作成し、Categoryでグループ化します。

stacked = df.set_index(['Item', 'Category']).stack().reset_index() stacked.columns = ['Item', 'Category', 'Shop', 'Price'] stacked.groupby('Category').agg({'Price':['count','sum','mean','std']})

結果として

 Price count sum mean std Category Books 3 58 19.333333 2.081666 Clothes 3 148 49.333333 4.041452 Technology 6 1800 300.000000 70.710678