Pandas DataFrameは複数の列の値を単一の列にスタックします

Question

次のDataFrameを想定します。

 key.0 key.1 key.2 topic 1 abc def ghi 8 2 xab xcd xef 9

すべてのkey。*列の値を、key。*列に対応するトピック値に関連付けられている単一の列「key」に結合するにはどうすればよいですか？これは私が望む結果です：

 topic key 1 8 abc 2 8 def 3 8 ghi 4 9 xab 5 9 xcd 6 9 xef

Key.N列の数は、一部の外部Nでは可変であることに注意してください。

Alexander · Accepted Answer

データフレームを溶かすことができます：

>>> keys = [c for c in df if c.startswith('key.')] >>> pd.melt(df, id_vars='topic', value_vars=keys, value_name='key') topic variable key 0 8 key.0 abc 1 9 key.0 xab 2 8 key.1 def 3 9 key.1 xcd 4 8 key.2 ghi 5 9 key.2 xef

また、キーのソースも提供します。

v0.20から、meltはpd.DataFrameクラスの最初のクラス関数です。

>>> df.melt('topic', value_name='key').drop('variable', 1) topic key 0 8 abc 1 9 xab 2 8 def 3 9 xcd 4 8 ghi 5 9 xef

YOBEN_S · Answer

OK、現在の回答の1つがこの質問の重複としてマークされているため、ここで回答します。

wide_to_long

pd.wide_to_long(df, ['key'], 'topic', 'age').reset_index().drop('age',1) Out[123]: topic key 0 8 abc 1 9 xab 2 8 def 3 9 xcd 4 8 ghi 5 9 xef

miraculixx · Answer

さまざまな方法を試した後、stackの魔法が理解されていれば、以下は多かれ少なかれ直感的です。

_# keep topic as index, stack other columns 'against' it stacked = df.set_index('topic').stack() # set the name of the new series created df = stacked.reset_index(name='key') # drop the 'source' level (key.*) df.drop('level_1', axis=1, inplace=True) _

結果のデータフレームは必要に応じて次のとおりです。

_ topic key 0 8 abc 1 8 def 2 8 ghi 3 9 xab 4 9 xcd 5 9 xef _

プロセスを完全に理解するために、中間結果を印刷することもできます。必要以上の列があっても構わない場合、重要なステップはset_index('topic')、stack()およびreset_index(name='key')です。