Pandas）の多重指数ソート

Question

Groupby操作を介して作成されたマルチインデックスDataFrameがあります。インデックスのいくつかのレベルを使用して複合ソートを実行しようとしていますが、必要なことを実行するソート関数が見つからないようです。

初期データセットは次のようになります（さまざまな製品の1日の売上高）。

 Date Manufacturer Product Name Product Launch Date Sales 0 2013-01-01 Apple iPod 2001-10-23 12 1 2013-01-01 Apple iPad 2010-04-03 13 2 2013-01-01 Samsung Galaxy 2009-04-27 14 3 2013-01-01 Samsung Galaxy Tab 2010-09-02 15 4 2013-01-02 Apple iPod 2001-10-23 22 5 2013-01-02 Apple iPad 2010-04-03 17 6 2013-01-02 Samsung Galaxy 2009-04-27 10 7 2013-01-02 Samsung Galaxy Tab 2010-09-02 7

Groupbyを使用して、日付範囲の合計を取得します。

> grouped = df.groupby(['Manufacturer', 'Product Name', 'Product Launch Date']).sum() Sales Manufacturer Product Name Product Launch Date Apple iPad 2010-04-03 30 iPod 2001-10-23 34 Samsung Galaxy 2009-04-27 24 Galaxy Tab 2010-09-02 22

ここまでは順調ですね！

今、私がしたい最後のことは、各メーカーの製品を発売日でソートすることですが、それらをメーカーの下で階層的にグループ化しておいてください-これが私がやろうとしているすべてです：

 Sales Manufacturer Product Name Product Launch Date Apple iPod 2001-10-23 34 iPad 2010-04-03 30 Samsung Galaxy 2009-04-27 24 Galaxy Tab 2010-09-02 22

Sortlevel（）を試してみると、以前の会社ごとのニース階層が失われます。

> grouped.sortlevel('Product Launch Date') Sales Manufacturer Product Name Product Launch Date Apple iPod 2001-10-23 34 Samsung Galaxy 2009-04-27 24 Apple iPad 2010-04-03 30 Samsung Galaxy Tab 2010-09-02 22

sort（）とsort_index（）は失敗します：

grouped.sort(['Manufacturer','Product Launch Date']) KeyError: u'no item named Manufacturer' grouped.sort_index(by=['Manufacturer','Product Launch Date']) KeyError: u'no item named Manufacturer'

簡単な操作のようですが、よくわかりません。

私はこれにMultiIndexを使用することに縛られていませんが、それがgroupby（）が返すものなので、それが私が取り組んできたものです。

ところで、最初のDataFrameを生成するコードは次のとおりです。

data = { 'Date': ['2013-01-01', '2013-01-01', '2013-01-01', '2013-01-01', '2013-01-02', '2013-01-02', '2013-01-02', '2013-01-02'], 'Manufacturer' : ['Apple', 'Apple', 'Samsung', 'Samsung', 'Apple', 'Apple', 'Samsung', 'Samsung',], 'Product Name' : ['iPod', 'iPad', 'Galaxy', 'Galaxy Tab', 'iPod', 'iPad', 'Galaxy', 'Galaxy Tab'], 'Product Launch Date' : ['2001-10-23', '2010-04-03', '2009-04-27', '2010-09-02','2001-10-23', '2010-04-03', '2009-04-27', '2010-09-02'], 'Sales' : [12, 13, 14, 15, 22, 17, 10, 7] } df = DataFrame(data, columns=['Date', 'Manufacturer', 'Product Name', 'Product Launch Date', 'Sales'])

Andy Hayden · Accepted Answer

ハックは、レベルの順序を変更することです。

In [11]: g Out[11]: Sales Manufacturer Product Name Product Launch Date Apple iPad 2010-04-03 30 iPod 2001-10-23 34 Samsung Galaxy 2009-04-27 24 Galaxy Tab 2010-09-02 22 In [12]: g.index = g.index.swaplevel(1, 2)

Sortlevel。（あなたが見つけたように）MultiIndexレベルを次の順序でソートします。

In [13]: g = g.sortlevel()

そしてスワップバック：

In [14]: g.index = g.index.swaplevel(1, 2) In [15]: g Out[15]: Sales Manufacturer Product Name Product Launch Date Apple iPod 2001-10-23 34 iPad 2010-04-03 30 Samsung Galaxy 2009-04-27 24 Galaxy Tab 2010-09-02 22

sortlevelは残りのラベルを順番に並べ替えるべきではないので、githubの問題が発生するだろうと思います。:) docnoteに言及する価値はありますが約 "ソートの必要性" 。

注：最初のgroupbyの順序を並べ替えることで、最初のswaplevelを回避できます。

g = df.groupby(['Manufacturer', 'Product Launch Date', 'Product Name']).sum()

Jim · Answer

このワンライナーは私のために働きます：

In [1]: grouped.sortlevel(["Manufacturer","Product Launch Date"], sort_remaining=False) Sales Manufacturer Product Name Product Launch Date Apple iPod 2001-10-23 34 iPad 2010-04-03 30 Samsung Galaxy 2009-04-27 24 Galaxy Tab 2010-09-02 22

これも機能することに注意してください。

groups.sortlevel([0,2], sort_remaining=False)

これは、2年以上前に最初に投稿したときには機能しませんでした。これは、sortlevelがデフォルトで、会社の階層を台無しにしたすべてのインデックスでソートされているためです。 sort_remainingは、昨年追加された動作を無効にします。参考のためにコミットリンクを次に示します： https://github.com/pydata/pandas/commit/3ad64b11e8e4bef47e3767f1d31cc26e39593277

fpersyn · Answer

MultiIndexを「インデックス列」（別名レベル）で並べ替えるには、 .sort_index() メソッドを使用し、そのlevel引数を設定する必要があります。複数のレベルで並べ替える場合は、引数をレベル名のリストに順番に設定する必要があります。

これにより、必要なDataFrameが得られます。

df.groupby(['Manufacturer', 'Product Name', 'Launch Date']).sum().sort_index(level=['Manufacturer','Launch Date'])

これにより、必要なDataFrameが得られます。

df.groupby(['Manufacturer', 'Product Name', 'Launch Date']).sum().sort_index(level=['Manufacturer','Launch Date'])

David Hollett · Answer

インデックスの保存に関心がない場合（私はしばしば任意の整数インデックスを好む）、次のワンライナーを使用できます。

grouped.reset_index().sort(["Manufacturer","Product Launch Date"])

Xavi · Answer

非常に深いMultiIndex内で複数のスワップを回避したい場合は、次の方法で試すこともできます。

レベルXによるスライス（リスト内包表記+ .loc + IndexSliceによる）
目的のレベルを並べ替えます（sortlevel（2））
レベルXインデックスのすべてのグループを連結します

ここにコードがあります：

import pandas as pd idx = pd.IndexSlice g = pd.concat([grouped.loc[idx[i,:,:],:].sortlevel(2) for i in grouped.index.levels[0]]) g