pandasで特定の条件が満たされている行の値を更新します

Question

次のデータフレームがあるとします。

列の値を更新する最も効率的な方法は何ですかfeatおよびanother_featstreamはnumber2？

これでしょうか？

for index, row in df.iterrows(): if df1.loc[index,'stream'] == 2: # do something

UPDATE：100を超える列がある場合はどうすればよいですか？更新する列に明示的に名前を付けたくありません。各列の値を2で除算します（ストリーム列を除く）。

私の目標が何であるかを明確にするために：

ストリーム2を持つすべての行のすべての値を2で除算しますが、ストリーム列は変更しません

jezrael · Accepted Answer

2つの列を同じ値に更新する必要がある場合は、 loc を使用できると思います。

df1.loc[df1['stream'] == 2, ['feat','another_feat']] = 'aaaa' print df1 stream feat another_feat a 1 some_value some_value b 2 aaaa aaaa c 2 aaaa aaaa d 3 some_value some_value

個別に更新する必要がある場合、1つのオプションを使用します。

df1.loc[df1['stream'] == 2, 'feat'] = 10 print df1 stream feat another_feat a 1 some_value some_value b 2 10 some_value c 2 10 some_value d 3 some_value some_value

別の一般的なオプションはuse numpy.where ：

df1['feat'] = np.where(df1['stream'] == 2, 10,20) print df1 stream feat another_feat a 1 20 some_value b 2 10 some_value c 2 10 some_value d 3 20 some_value

編集：条件がstreamであるTrueなしですべての列を分割する必要がある場合は、使用します：

print df1 stream feat another_feat a 1 4 5 b 2 4 5 c 2 2 9 d 3 1 7 #filter columns all without stream cols = [col for col in df1.columns if col != 'stream'] print cols ['feat', 'another_feat'] df1.loc[df1['stream'] == 2, cols ] = df1 / 2 print df1 stream feat another_feat a 1 4.0 5.0 b 2 2.0 2.5 c 2 1.0 4.5 d 3 1.0 7.0

Thanos · Answer

次のように、.ixでも同じことができます。

In [1]: df = pd.DataFrame(np.random.randn(5,4), columns=list('abcd')) In [2]: df Out[2]: a b c d 0 -0.323772 0.839542 0.173414 -1.341793 1 -1.001287 0.676910 0.465536 0.229544 2 0.963484 -0.905302 -0.435821 1.934512 3 0.266113 -0.034305 -0.110272 -0.720599 4 -0.522134 -0.913792 1.862832 0.314315 In [3]: df.ix[df.a>0, ['b','c']] = 0 In [4]: df Out[4]: a b c d 0 -0.323772 0.839542 0.173414 -1.341793 1 -1.001287 0.676910 0.465536 0.229544 2 0.963484 0.000000 0.000000 1.934512 3 0.266113 0.000000 0.000000 -0.720599 4 -0.522134 -0.913792 1.862832 0.314315

編集

追加情報の後、以下は、いくつかの条件が満たされている場合、半分の値を持つすべての列を返します。

>> condition = df.a > 0 >> df[condition][[i for i in df.columns.values if i not in ['a']]].apply(lambda x: x/2)

これがお役に立てば幸いです！