pandas dataframeからnull値の行を削除します

Question

列の1つにnullの値がある行をデータフレームから削除しようとしています。私が見つけることができるほとんどのヘルプは、これまでのところ機能しなかったNaN値の削除に関連しています。

ここで、データフレームを作成しました。

 # successfully crated data frame df1 = ut.get_data(symbols, dates) # column heads are 'SPY', 'BBD' # can't get rid of row containing null val in column BBD # tried each of these with the others commented out but always had an # error or sometimes I was able to get a new column of boolean values # but i just want to drop the row df1 = pd.notnull(df1['BBD']) # drops rows with null val, not working df1 = df1.drop(2010-05-04, axis=0) df1 = df1[df1.'BBD' != null] df1 = df1.dropna(subset=['BBD']) df1 = pd.notnull(df1.BBD) # I know the date to drop but still wasn't able to drop the row df1.drop([2015-10-30]) df1.drop(['2015-10-30']) df1.drop([2015-10-30], axis=0) df1.drop(['2015-10-30'], axis=0) with pd.option_context('display.max_row', None): print(df1)

出力は次のとおりです。

Output

できればnull値で行を識別することと、日付でドロップする方法の両方で、この行をドロップする方法を教えてください。

私はpandasと非常に長い間働いていませんでした、そして、私はこれに1時間立ち往生しています。どんなアドバイスでも大歓迎です。

Marjan Moderc · Accepted Answer

これは仕事をするはずです：

df = df.dropna(how='any',axis=0)

「any」Null値を持つすべてのrow（axis = 0）を消去します。

例：

#Recreate random DataFrame with Nan values df = pd.DataFrame(index = pd.date_range('2017-01-01', '2017-01-10', freq='1d')) # Average speed in miles per hour df['A'] = np.random.randint(low=198, high=205, size=len(df.index)) df['B'] = np.random.random(size=len(df.index))*2 #Create dummy NaN value on 2 cells df.iloc[2,1]=None df.iloc[5,0]=None print(df) A B 2017-01-01 203.0 1.175224 2017-01-02 199.0 1.338474 2017-01-03 198.0 NaN 2017-01-04 198.0 0.652318 2017-01-05 199.0 1.577577 2017-01-06 NaN 0.234882 2017-01-07 203.0 1.732908 2017-01-08 204.0 1.473146 2017-01-09 198.0 1.109261 2017-01-10 202.0 1.745309 #Delete row with dummy value df = df.dropna(how='any',axis=0) print(df) A B 2017-01-01 203.0 1.175224 2017-01-02 199.0 1.338474 2017-01-04 198.0 0.652318 2017-01-05 199.0 1.577577 2017-01-07 203.0 1.732908 2017-01-08 204.0 1.473146 2017-01-09 198.0 1.109261 2017-01-10 202.0 1.745309

詳細については、 reference を参照してください。

DataFrameで問題がなければ、NaNをドロップするのは簡単です。それでも機能しない場合は、列に適切なデータ型が定義されていることを確認してください（ pd.to_numeric が思い浮かびます...）

Narets · Answer

----すべての列を空にする-------

df = df.dropna(how='any',axis=0)

--- NULLをクリーンアップする場合1列に基づく .---

df[~df['B'].isnull()]

 A B 2017-01-01 203.0 1.175224 2017-01-02 199.0 1.338474 **2017-01-03 198.0 NaN** clean 2017-01-04 198.0 0.652318 2017-01-05 199.0 1.577577 2017-01-06 NaN 0.234882 2017-01-07 203.0 1.732908 2017-01-08 204.0 1.473146 2017-01-09 198.0 1.109261 2017-01-10 202.0 1.745309

間違いはご容赦ください。

joe_slow · Answer

列の値は「null」であり、dropnaの目的である真のNaNではないようです。だから私は試してみます：

df[df.BBD != 'null']

または、値が実際にNaNである場合、

df[pd.notnull(df.BBD)]