Pandas / NumPyで列/変数が数値かどうかを判断する方法は？

Question

PandasやNumPyの変数がnumericであるかどうかを判断するより良い方法はありますか？

dictionaryをキーとして、dtypes/numericを値として、自己定義のnotを持っています。

ayhan · Answer

np.issubdtype を使用して、dtypeがnp.numberのサブdtypeであるかどうかを確認できます。例：

np.issubdtype(arr.dtype, np.number) # where arr is a numpy array np.issubdtype(df['X'].dtype, np.number) # where df['X'] is a pandas Series

これはnumpyのdtypesでは機能しますが、pandas pd.Categorical as Thomas noted などの特定のタイプでは失敗します。 categoricals is_numeric_dtype from pandasはnp.issubdtypeよりも優れた代替手段です。

df = pd.DataFrame({'A': [1, 2, 3], 'B': [1.0, 2.0, 3.0], 'C': [1j, 2j, 3j], 'D': ['a', 'b', 'c']}) df Out: A B C D 0 1 1.0 1j a 1 2 2.0 2j b 2 3 3.0 3j c df.dtypes Out: A int64 B float64 C complex128 D object dtype: object

np.issubdtype(df['A'].dtype, np.number) Out: True np.issubdtype(df['B'].dtype, np.number) Out: True np.issubdtype(df['C'].dtype, np.number) Out: True np.issubdtype(df['D'].dtype, np.number) Out: False

複数の列の場合、np.vectorizeを使用できます。

is_number = np.vectorize(lambda x: np.issubdtype(x, np.number)) is_number(df.dtypes) Out: array([ True, True, True, False], dtype=bool)

そして、選択のために、pandasは現在 select_dtypes を持っています：

df.select_dtypes(include=[np.number]) Out: A B C 0 1 1.0 1j 1 2 2.0 2j 2 3 3.0 3j

danthelion · Answer

pandas 0.20.2 できるよ：

import pandas as pd from pandas.api.types import is_string_dtype from pandas.api.types import is_numeric_dtype df = pd.DataFrame({'A': ['a', 'b', 'c'], 'B': [1.0, 2.0, 3.0]}) is_string_dtype(df['A']) >>>> True is_numeric_dtype(df['B']) >>>> True

danodonovan · Answer

コメント内の@jaimeの回答に基づいて、.dtype.kind対象の列。例えば;

>>> import pandas as pd >>> df = pd.DataFrame({'numeric': [1, 2, 3], 'not_numeric': ['A', 'B', 'C']}) >>> df['numeric'].dtype.kind in 'bifc' >>> True >>> df['not_numeric'].dtype.kind in 'bifc' >>> False

NB bifcはb bool, i int, f float, c complex-uが何であるかわかりません。

Punit S · Answer

列の値のいずれかの型をチェックするだけではどうですか？私たちは常にこのようなものを持っています：

isinstance(x, (int, long, float, complex))

下のデータフレームの列のデータ型を確認しようとすると、それらを「オブジェクト」として取得しますが、私が期待している数値型ではありません：

df = pd.DataFrame(columns=('time', 'test1', 'test2')) for i in range(20): df.loc[i] = [datetime.now() - timedelta(hours=i*1000),i*10,i*100] df.dtypes time datetime64[ns] test1 object test2 object dtype: object

次のことを行うと、正確な結果が得られるようです。

isinstance(df['test1'][len(df['test1'])-1], (int, long, float, complex))

返却値

True

Jeff · Answer

これは、数値型のデータのみを返す擬似内部メソッドです

In [27]: df = DataFrame(dict(A = np.arange(3), B = np.random.randn(3), C = ['foo','bar','bah'], D = Timestamp('20130101'))) In [28]: df Out[28]: A B C D 0 0 -0.667672 foo 2013-01-01 00:00:00 1 1 0.811300 bar 2013-01-01 00:00:00 2 2 2.020402 bah 2013-01-01 00:00:00 In [29]: df.dtypes Out[29]: A int64 B float64 C object D datetime64[ns] dtype: object In [30]: df._get_numeric_data() Out[30]: A B 0 0 -0.667672 1 1 0.811300 2 2 2.020402

Beta · Answer

他のすべての回答に追加するために、df.info()を使用して各列のデータ型を取得することもできます。

paulwasit · Answer

また試すことができます：

df_dtypes = np.array(df.dtypes) df_numericDtypes= [x.kind in 'bifc' for x in df_dtypes]

ブール値のリストを返します：数値の場合はTrue、そうでない場合はFalse。