Python：Count条件で行を削除する

Question

pandasデータフレームのフィルタリングに問題があります。

city NYC NYC NYC NYC SYD SYD SEL SEL ... df.city.value_counts()

カウントの頻度が4未満の都市の行を削除する、たとえばSYDやSELを削除します。

それらを都市ごとに手動でドロップせずにそうする方法は何でしょうか？

YOBEN_S · Accepted Answer

ここでフィルターを使います

df.groupby('city').filter(lambda x : len(x)>3) Out[1743]: city 0 NYC 1 NYC 2 NYC 3 NYC

ソリューション2 transform

sub_df = df[df.groupby('city').city.transform('count')>3].copy() # add copy for future warning when you need to modify the sub df

jpp · Answer

これは、pd.Series.value_countsを使用する1つの方法です。

counts = df['city'].value_counts() res = df[~df['city'].isin(counts[counts < 5].index)]

Aaron N. Brock · Answer

あなたはvalue_counts()を探していると思います

# Import the great and powerful pandas import pandas as pd # Create some example data df = pd.DataFrame({ 'city': ['NYC', 'NYC', 'SYD', 'NYC', 'SEL', 'NYC', 'NYC'] }) # Get the count of each value value_counts = df['city'].value_counts() # Select the values where the count is less than 3 (or 5 if you like) to_remove = value_counts[value_counts <= 3].index # Keep rows where the city column is not in to_remove df = df[~df.city.isin(to_remove)]

Sruthi V · Answer

別の解決策：

threshold=3 df['Count'] = df.groupby('City')['City'].transform(pd.Series.value_counts) df=df[df['Count']>=threshold] df.drop(['Count'], axis = 1, inplace = True) print(df) City 0 NYC 1 NYC 2 NYC 3 NYC