Pythonで適用する場合（pd.to_numeric）およびastype（np.float64）する場合

Question

pandasという名前のxivというDataFrameオブジェクトがあり、int64ボリューム測定値の列があります。

In[]: xiv['Volume'].head(5) Out[]: 0 252000 1 484000 2 62000 3 168000 4 232000 Name: Volume, dtype: int64

次の解決策を提案する他の投稿（ this や this など）を読みました。しかし、どちらのアプローチを使用しても、基になるデータのdtypeは変更されないようです。

In[]: xiv['Volume'] = pd.to_numeric(xiv['Volume']) In[]: xiv['Volume'].dtypes Out[]: dtype('int64')

または...

In[]: xiv['Volume'] = pd.to_numeric(xiv['Volume']) Out[]: ###omitted for brevity### In[]: xiv['Volume'].dtypes Out[]: dtype('int64') In[]: xiv['Volume'] = xiv['Volume'].apply(pd.to_numeric) In[]: xiv['Volume'].dtypes Out[]: dtype('int64')

また、別のpandas Seriesを作成し、そのシリーズで上記のメソッドを使用して、x['Volume']オブジェクト（pandas.core.series.Seriesオブジェクト）に再割り当てを試みました。

しかし、numpyパッケージのfloat64タイプを使用してこの問題の解決策を見つけました-これは機能しますが、わかりませんなぜ違うのか.

In[]: xiv['Volume'] = xiv['Volume'].astype(np.float64) In[]: xiv['Volume'].dtypes Out[]: dtype('float64')

pandasライブラリを使用してnumpyライブラリがfloat64クラスを使用して簡単に実行できるように見える方法を誰かが説明できますか。つまり、xiv DataFrameの列をfloat64に変換します。

MaxU · Accepted Answer

既に数値のdtype（int8|16|32|64、float64、boolean）がある場合は、Pandasを使用して別の「数値」dtypeに変換できます- 。astype（）メソッド。

デモ：

In [90]: df = pd.DataFrame(np.random.randint(10**5,10**7,(5,3)),columns=list('abc'), dtype=np.int64) In [91]: df Out[91]: a b c 0 9059440 9590567 2076918 1 5861102 4566089 1947323 2 6636568 162770 2487991 3 6794572 5236903 5628779 4 470121 4044395 4546794 In [92]: df.dtypes Out[92]: a int64 b int64 c int64 dtype: object In [93]: df['a'] = df['a'].astype(float) In [94]: df.dtypes Out[94]: a float64 b int64 c int64 dtype: object

object（文字列）dtypeでは機能せず、は数値に変換できません：

In [95]: df.loc[1, 'b'] = 'XXXXXX' In [96]: df Out[96]: a b c 0 9059440.0 9590567 2076918 1 5861102.0 XXXXXX 1947323 2 6636568.0 162770 2487991 3 6794572.0 5236903 5628779 4 470121.0 4044395 4546794 In [97]: df.dtypes Out[97]: a float64 b object c int64 dtype: object In [98]: df['b'].astype(float) ... skipped ... ValueError: could not convert string to float: 'XXXXXX'

したがって、ここでは pd.to_numeric（）メソッドを使用します。

In [99]: df['b'] = pd.to_numeric(df['b'], errors='coerce') In [100]: df Out[100]: a b c 0 9059440.0 9590567.0 2076918 1 5861102.0 NaN 1947323 2 6636568.0 162770.0 2487991 3 6794572.0 5236903.0 5628779 4 470121.0 4044395.0 4546794 In [101]: df.dtypes Out[101]: a float64 b float64 c int64 dtype: object

reevesnmortimer · Answer

これに関する技術的な説明はありませんが、文字列 'nan'を変換するときにpd.to_numeric（）が次のエラーを発生させることに気付きました。

In [10]: df = pd.DataFrame({'value': 'nan'}, index=[0]) In [11]: pd.to_numeric(df.value) Traceback (most recent call last): File "<ipython-input-11-98729d13e45c>", line 1, in <module> pd.to_numeric(df.value) File "C:\Users\joshua.lee\AppData\Local\Continuum\anaconda3\lib\site-packages\pandas\core	ools
umeric.py", line 133, in to_numeric coerce_numeric=coerce_numeric) File "pandas/_libs/src\inference.pyx", line 1185, in pandas._libs.lib.maybe_convert_numeric ValueError: Unable to parse string "nan" at position 0

一方、astype（float）はしません：

df.value.astype(float) Out[12]: 0 NaN Name: value, dtype: float64

Mohd Waseem · Answer

これを使用できます：

pd.to_numeric(df.valueerrors='coerce').fillna(0, downcast='infer')

Nanの代わりにゼロを使用します。