pandas

Question

csvファイルをpandasデータフレームに次の形式で読み取ろうとしています

dp = pd.read_csv('products.csv', header = 0, dtype = {'name': str,'review': str, 'rating': int,'Word_count': dict}, engine = 'c') print dp.shape for col in dp.columns: print 'column', col,':', type(col[0]) print type(dp['rating'][0]) dp.head(3)

これは出力です：

(183531, 4) column name : <type 'str'> column review : <type 'str'> column rating : <type 'str'> column Word_count : <type 'str'> <type 'numpy.int64'>

pandasは、辞書の文字列表現を指定された辞書に変換するのが難しいことに気付くかもしれません this =および this 。しかし、「rating」列のコンテンツはどのようにstrとnumpy.int64の両方になることができますか？

ちなみに、エンジンやヘッダーを指定しないなどの微調整は何も変更しません。

よろしくお願いします

Colonel Beauvel · Accepted Answer

ただ：

for col in dp.columns: print 'column', col,':', col[0]

そして、文字列である各列名の最初の文字を印刷することがわかります。 各シリーズではなく、ここで列の名前で反復することに注意してください。

必要なのは、ループを通じて各列のタイプをチェックすることです。

for col in dp.columns: print 'column', col,':', type(dp[col][0])

...列の評価と同様に!!

Mike M&#252;ller · Answer

使用する：

dp.info()

列のデータ型を確認します。 dp.columnsは列ヘッダー名を参照します。これは文字列です。

taotao.li · Answer

最初にこれをチェックする必要があると思います： Pandas：列のデータ型を変更

googleのときpandas dataframe column type、それが上位5つの回答です。

Sourav Das · Answer

read_tableデリミタ付き"," に加えて literal_eval関連する列の値を変換する関数として。

recipes = pd.read_table("\souravD\PP_recipes.csv", sep=r',', names=["id", "i", "name_tokens", "ingredient_tokens", "steps_tokens", "techniques","calorie_level","ingredient_ids"], converters = {'name_tokens' : literal_eval, 'ingredient_tokens' : literal_eval, 'steps_tokens' : literal_eval, 'techniques' : literal_eval, 'ingredient_ids' : literal_eval},header=0)

image of recipes dataframe after changing datatype