Pandas DataFrameの重複値をカウントする

Question

これを行うには簡単な方法があるはずですが、SOの場合のエレガントな解決策を見つけることも、自分で解決することもできませんでした。

DataFrameの列のセットに基づいて重複値の数を数えようとしています。

例：

print df Month LSOA code Longitude Latitude Crime type 0 2015-01 E01000916 -0.106453 51.518207 Bicycle theft 1 2015-01 E01000914 -0.111497 51.518226 Burglary 2 2015-01 E01000914 -0.111497 51.518226 Burglary 3 2015-01 E01000914 -0.111497 51.518226 Other theft 4 2015-01 E01000914 -0.113767 51.517372 Theft from the person

私の回避策：

counts = dict() for i, row in df.iterrows(): key = ( row['Longitude'], row['Latitude'], row['Crime type'] ) if counts.has_key(key): counts[key] = counts[key] + 1 else: counts[key] = 1

そして、私はカウントを取得します：

{(-0.11376700000000001, 51.517371999999995, 'Theft from the person'): 1, (-0.111497, 51.518226, 'Burglary'): 2, (-0.111497, 51.518226, 'Other theft'): 1, (-0.10645299999999999, 51.518207000000004, 'Bicycle theft'): 1}

このコードも改善される可能性があるという事実（お気軽にコメントしてください）を除いて、パンダを通じてそれを行う方法は何でしょうか？

興味のある方のために、私は https://data.police.uk/ のデータセットに取り組んでいます

jezrael · Accepted Answer

関数 size でgroupbyを使用できます。次に、列の名前を変更してインデックスをリセットします0からcountへ。

print df Month LSOA code Longitude Latitude Crime type 0 2015-01 E01000916 -0.106453 51.518207 Bicycle theft 1 2015-01 E01000914 -0.111497 51.518226 Burglary 2 2015-01 E01000914 -0.111497 51.518226 Burglary 3 2015-01 E01000914 -0.111497 51.518226 Other theft 4 2015-01 E01000914 -0.113767 51.517372 Theft from the person df = df.groupby(['Longitude', 'Latitude', 'Crime type']).size().reset_index(name='count') print df Longitude Latitude Crime type count 0 -0.113767 51.517372 Theft from the person 1 1 -0.111497 51.518226 Burglary 2 2 -0.111497 51.518226 Other theft 1 3 -0.106453 51.518207 Bicycle theft 1 print df['count'] 0 1 1 2 2 1 3 1 Name: count, dtype: int64

jpp · Answer

O(n)ソリューションはcollections.Counterを介して可能です：

from collections import Counter c = Counter(list(Zip(df.Longitude, df.Latitude, df.Crime_type)))

結果：

Counter({(-0.113767, 51.517372, 'Theft-from-the-person'): 1, (-0.111497, 51.518226, 'Burglary'): 2, (-0.111497, 51.518226, 'Other-theft'): 1, (-0.106453, 51.518207, 'Bicycle-theft'): 1})

Alexander · Answer

経度と緯度でグループ化し、 value_counts Crime type列。

df.groupby(['Longitude', 'Latitude'])['Crime type'].value_counts().to_frame('count') count Longitude Latitude Crime type -0.113767 51.517372 Theft from the person 1 -0.111497 51.518226 Burglary 2 Other theft 1 -0.106453 51.518207 Bicycle theft 1