`.loc`および` .iloc`とMultiIndex'd DataFrame

Question

MultiIndexされたDataFrameにインデックスを付ける場合、.ilocはインデックスの「内部レベル」を参照していると想定しているのに対し、.locは外部レベルを参照しているようです。

例えば：

np.random.seed(123) iterables = [['bar', 'baz', 'foo', 'qux'], ['one', 'two']] idx = pd.MultiIndex.from_product(iterables, names=['first', 'second']) df = pd.DataFrame(np.random.randn(8, 4), index=idx) # .loc looks at the outer index: print(df.loc['qux']) # df.loc['two'] would throw KeyError 0 1 2 3 second one -1.25388 -0.63775 0.90711 -1.42868 two -0.14007 -0.86175 -0.25562 -2.79859 # while .iloc looks at the inner index: print(df.iloc[-1]) 0 -0.14007 1 -0.86175 2 -0.25562 3 -2.79859 Name: (qux, two), dtype: float64

2つの質問：

まず、これはなぜですか？それは意図的な設計決定ですか？

次に、.ilocを使用してインデックスの外部レベルを参照し、以下の結果を生成できますか？最初にget_level_valuesを使用してインデックスの最後のメンバーを検索し、次に.loc- indexを使用して検索できることを知っていますが、ファンキーな.ilocを使用して直接実行できる場合はさまよう構文またはケース専用に設計された既存の関数。

# df.iloc[-1] qux one 0.89071 1.75489 1.49564 1.06939 two -0.77271 0.79486 0.31427 -1.32627

Brad Solomon · Accepted Answer

はい、これは慎重な設計決定です。

.ilocは厳密な位置インデクサーであり、構造をまったく考慮せず、最初の実際の動作のみを考慮します。 ... .locdoesは、レベルの動作を考慮します。【強調追加】

したがって、質問で与えられた望ましい結果は、.ilocでは柔軟な方法では不可能です。いくつかの同様の質問で使用される最も近い回避策は、

print(df.loc[[df.index.get_level_values(0)[-1]]]) 0 1 2 3 first second qux one -1.25388 -0.63775 0.90711 -1.42868 two -0.14007 -0.86175 -0.25562 -2.79859

ダブルブラケットを使用すると、最初のインデックスレベルが保持されます。

FabienP · Answer

以下を使用できます。

_df.iloc[[6, 7], :] Out[1]: 0 1 2 3 first second qux one -1.253881 -0.637752 0.907105 -1.428681 two -0.140069 -0.861755 -0.255619 -2.798589 _

以下に示すように、_[6, 7]_はこれらの行の実際の行インデックスに対応します。

_df.reset_index() Out[]: first second 0 1 2 3 0 bar one -1.085631 0.997345 0.282978 -1.506295 1 bar two -0.578600 1.651437 -2.426679 -0.428913 2 baz one 1.265936 -0.866740 -0.678886 -0.094709 3 baz two 1.491390 -0.638902 -0.443982 -0.434351 4 foo one 2.205930 2.186786 1.004054 0.386186 5 foo two 0.737369 1.490732 -0.935834 1.175829 6 qux one -1.253881 -0.637752 0.907105 -1.428681 7 qux two -0.140069 -0.861755 -0.255619 -2.798589 _

これは_df.iloc[[-2, -1], :]_またはdf.iloc[range(-2, 0), :]でも機能します。

編集：それをより一般的なソリューションに変える

次に、ジェネリック関数を取得することが可能です：

_def multindex_iloc(df, index): label = df.index.levels[0][index] return df.iloc[df.index.get_loc(label)] multiindex_loc(df, -1) Out[]: 0 1 2 3 first second qux one -1.253881 -0.637752 0.907105 -1.428681 two -0.140069 -0.861755 -0.255619 -2.798589 multiindex_loc(df, 2) Out[]: 0 1 2 3 first second foo one 2.205930 2.186786 1.004054 0.386186 two 0.737369 1.490732 -0.935834 1.175829 _

H&#229;ken Lid · Answer

swaplevel メソッドを使用して、locを使用する前にインデックスを並べ替えることができます。

df.swaplevel(0,-1).loc['two']

質問のサンプルデータを使用すると、次のようになります。

>>> df 0 1 2 3 first second bar one -1.085631 0.997345 0.282978 -1.506295 two -0.578600 1.651437 -2.426679 -0.428913 baz one 1.265936 -0.866740 -0.678886 -0.094709 two 1.491390 -0.638902 -0.443982 -0.434351 foo one 2.205930 2.186786 1.004054 0.386186 two 0.737369 1.490732 -0.935834 1.175829 qux one -1.253881 -0.637752 0.907105 -1.428681 two -0.140069 -0.861755 -0.255619 -2.798589 >>> df.loc['bar'] 0 1 2 3 second one -1.085631 0.997345 0.282978 -1.506295 two -0.578600 1.651437 -2.426679 -0.428913 >>> df.swaplevel().loc['two'] 0 1 2 3 first bar -0.578600 1.651437 -2.426679 -0.428913 baz 1.491390 -0.638902 -0.443982 -0.434351 foo 0.737369 1.490732 -0.935834 1.175829 qux -0.140069 -0.861755 -0.255619 -2.798589

swaplevelはMultiIndexメソッドですが、DataFrameで直接呼び出すことができます。デフォルトでは、内側の2つのレベルを交換するため、マルチインデックスに3つ以上のレベルがある場合は、交換するレベルを明示的に指定する必要があります。

df.swaplevel(0,-1).loc['two']