Pandas DataFrameが理由なくintをfloatに変換しないようにする方法は？

Question

私は小さなPandas DataFrameを作成し、それに整数であるはずのデータをいくつか追加しています。しかし、dtypeを明示的にintに設定し、int値のみを提供するように非常に懸命に努力しているにもかかわらず、それは常にフロートになってしまいます。それは私にはまったく意味がなく、動作は完全に一貫しているようにも見えません。

次のPythonスクリプトを検討してください：

_import pandas as pd df = pd.DataFrame(columns=["col1", "col2"]) # No dtype specified. print(df.dtypes) # dtypes are object, since there is no information yet. df.loc["row1", :] = int(0) # Add integer data. print(df.dtypes) # Both columns have now become int64, as expected. df.loc["row2", :] = int(0) # Add more integer data. print(df.dtypes) # Both columns are now float64??? print(df) # Shows as 0.0. # Let's try again, but be more specific. del df df = pd.DataFrame(columns=["col1", "col2"], dtype=int) # Explicit set dtype. print(df.dtypes) # For some reason both colums are already float64??? df.loc["row1", :] = int(0) print(df.dtypes) # Both colums still float64. # Output: """ col1 object col2 object dtype: object col1 int64 col2 int64 dtype: object col1 float64 col2 float64 dtype: object col1 col2 row1 0.0 0.0 row2 0.0 0.0 col1 float64 col2 float64 dtype: object col1 float64 col2 float64 dtype: object """ _

最後にdf = df.astype(int)を実行して修正できます。同様にそれを修正する他の方法があります。しかし、これは必要ではありません。私は最初にカラムがフロートになる原因となっている問題を理解しようとしています。

何が起こっている？

Pythonバージョン3.7.1 Pandasバージョン0.23.4

編集：

誤解している人もいると思います。このDataFrameにはNaN値はありません。作成直後は次のようになります。

_Empty DataFrame Columns: [col1, col2] Index: [] _

これはemptyDataframe、df.shape = 0ですが、NaNはなく、行はまだありません。

さらに悪いことも発見しました。 intになるようにデータを追加した後にdf = df.astype(int)を実行しても、さらにデータを追加するとすぐに再びフロートになります！

_df = pd.DataFrame(columns=["col1", "col2"], dtype=int) df.loc["row1", :] = int(0) df.loc["row2", :] = int(0) df = df.astype(int) # Force it back to int. print(df.dtypes) # It is now ints again. df.loc["row3", :] = int(0) # Add another integer row. print(df.dtypes) # It is now float again??? # Output: """ col1 int32 col2 int32 dtype: object col1 float64 col2 float64 dtype: object """ _

バージョン0.24で推奨される修正は、私の問題とは関係がないようです。その機能はNullable Integerデータ型についてです。データにNaNまたはNone値がありません。

Rich Andrews · Accepted Answer

df.loc["rowX"] = int(0)が機能し、質問で提起された問題を解決します。 df.loc["rowX",:] = int(0)は機能しません。それは驚きです。

df.loc["rowX"] = int(0)は、必要なdtypeを保持しながら、空のデータフレームを生成する機能を提供します。ただし、行全体で一度にそうすることができます。

df.loc["rowX"] = [np.int64(0), np.int64(1)]は機能します。

.loc[]は https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.loc.html によるラベルベースの割り当てに適しています。注：0.24ドキュメントは、新しい行を挿入するための.loc []を示していません。

このドキュメントは、.loc[]を使用して、列を区別する方法で割り当てによって行を追加することを示しています。ただし、DataFrameにデータが入力されている場合はそうなります。

しかし、空のフレームをスライスすると奇妙になります。

import pandas as pd import numpy as np import sys print(sys.version) print(pd.__version__) print("int dtypes preserved") # append on populated DataFrame df = pd.DataFrame([[0, 0], [1,1]], index=['a', 'b'], columns=["col1", "col2"]) df.loc["c"] = np.int64(0) # slice existing rows df.loc["a":"c"] = np.int64(1) df.loc["a":"c", "col1":"col2":1] = np.int64(2) print(df.dtypes) # no selection AND no data, remains np.int64 if defined as such df = pd.DataFrame(columns=["col1", "col2"], dtype=np.int64) df.loc[:, "col1":"col2":1] = np.int64(0) df.loc[:,:] = np.int64(0) print(df.dtypes) # and works if no index but data df = pd.DataFrame([[0, 0], [1,1]], columns=["col1", "col2"]) df.loc[:,"col1":"col2":1] = np.int64(0) print(df.dtypes) # the surprise... label based insertion for the entire row does not convert to float df = pd.DataFrame(columns=["col1", "col2"], dtype=np.int64) df.loc["a"] = np.int64(0) print(df.dtypes) # a surprise because referring to all columns, as above, does convert to float print("unexpectedly converted to float dtypes") df = pd.DataFrame(columns=["col1", "col2"], dtype=np.int64) df.loc["a", "col1":"col2"] = np.int64(0) print(df.dtypes)

3.7.2 (default, Mar 19 2019, 10:33:22) [Clang 10.0.0 (clang-1000.11.45.5)] 0.24.2 int dtypes preserved col1 int64 col2 int64 dtype: object col1 int64 col2 int64 dtype: object col1 int64 col2 int64 dtype: object col1 int64 col2 int64 dtype: object unexpectedly converted to float dtypes col1 float64 col2 float64 dtype: object