pandasデータフレームで条件付きを使用して新しい列を生成する

Question

私はpandasこのようなデータフレームを持っています：

 portion used 0 1 1.0 1 2 0.3 2 3 0.0 3 4 0.8

used列に基づいて新しい列を作成したいので、dfは次のようになります。

 portion used alert 0 1 1.0 Full 1 2 0.3 Partial 2 3 0.0 Empty 3 4 0.8 Partial

に基づいて新しいalert列を作成します
usedが1.0の場合、alertはFullでなければなりません。
usedが0.0の場合、alertはEmptyでなければなりません。
それ以外の場合、alertはPartialでなければなりません。

それを行う最良の方法は何ですか？

Ffisegydd · Answer

さまざまな状態「Full」、「Partial」、「Empty」などを返す関数を定義し、df.applyを使用して各行に関数を適用できます。関数が行に適用されるようにするには、キーワード引数axis=1を渡す必要があることに注意してください。

import pandas as pd def alert(c): if c['used'] == 1.0: return 'Full' Elif c['used'] == 0.0: return 'Empty' Elif 0.0 < c['used'] < 1.0: return 'Partial' else: return 'Undefined' df = pd.DataFrame(data={'portion':[1, 2, 3, 4], 'used':[1.0, 0.3, 0.0, 0.8]}) df['alert'] = df.apply(alert, axis=1) # portion used alert # 0 1 1.0 Full # 1 2 0.3 Partial # 2 3 0.0 Empty # 3 4 0.8 Partial

Primer · Answer

または、次のこともできます。

import pandas as pd import numpy as np df = pd.DataFrame(data={'portion':np.arange(10000), 'used':np.random.Rand(10000)}) %%timeit df.loc[df['used'] == 1.0, 'alert'] = 'Full' df.loc[df['used'] == 0.0, 'alert'] = 'Empty' df.loc[(df['used'] >0.0) & (df['used'] < 1.0), 'alert'] = 'Partial'

同じ出力が得られますが、10000行で約100倍高速に実行されます。

100 loops, best of 3: 2.91 ms per loop

次に、適用を使用します。

%timeit df['alert'] = df.apply(alert, axis=1) 1 loops, best of 3: 287 ms per loop

選択は、データフレームの大きさに依存すると思います。

Zero · Answer

つかいます np.where、通常は高速です

In [845]: df['alert'] = np.where(df.used == 1, 'Full', np.where(df.used == 0, 'Empty', 'Partial')) In [846]: df Out[846]: portion used alert 0 1 1.0 Full 1 2 0.3 Partial 2 3 0.0 Empty 3 4 0.8 Partial

_{タイミング}

In [848]: df.shape Out[848]: (100000, 3) In [849]: %timeit df['alert'] = np.where(df.used == 1, 'Full', np.where(df.used == 0, 'Empty', 'Partial')) 100 loops, best of 3: 6.17 ms per loop In [850]: %%timeit ...: df.loc[df['used'] == 1.0, 'alert'] = 'Full' ...: df.loc[df['used'] == 0.0, 'alert'] = 'Empty' ...: df.loc[(df['used'] >0.0) & (df['used'] < 1.0), 'alert'] = 'Partial' ...: 10 loops, best of 3: 21.9 ms per loop In [851]: %timeit df['alert'] = df.apply(alert, axis=1) 1 loop, best of 3: 2.79 s per loop

user1857373 · Answer

df['TaxStatus'] = np.where(df.Public == 1, True, np.where(df.Public == 2, False))

これは、ValueErrorを除いて機能するように見えます。xとyの両方またはどちらも指定しないでください。

Spcogg the second · Answer

コメントできないので、新しい答えを作成します。Ffisegyddのアプローチを改善するために、辞書とdict.get()メソッドを使用して、関数を.apply()に渡し、管理しやすくすることができます。

import pandas as pd def alert(c): mapping = {1.0: 'Full', 0.0: 'Empty'} return mapping.get(c['used'], 'Partial') df = pd.DataFrame(data={'portion':[1, 2, 3, 4], 'used':[1.0, 0.3, 0.0, 0.8]}) df['alert'] = df.apply(alert, axis=1)

ユースケースによっては、関数定義の外側で辞書を定義することもできます。