データフレームのキャリッジリターンを削除する方法

Question

Id、country_name、location、total_deathsという名前の列を含むデータフレームがあります。データクリーニングプロセスを実行しているときに、_' '_が添付された行の値に遭遇しました。クリーニングプロセスが完了したら、結果のデータフレームをdestination.csvファイルに保存します。上記の特定の行には__が付加されているため、常に新しい行が作成されます。

_id 29 location Uttar Pradesh
 country_name India total_deaths 20 _

__を削除したい。 df.replace({' ': ''}, regex=True)を試しました。それは私のために働いていません。

他に解決策はありますか？誰かが助けることができますか？

編集：

上記のプロセスでは、dfを繰り返し処理して、__が存在するかどうかを確認しています。存在する場合は、交換する必要があります。ここで、row.replace()またはrow.str.strip()が機能していないようです。または、間違った方法で実行している可能性があります。

replace()の使用中に、列名または行番号を指定したくありません。 'location'列のみに__が含まれるかどうかはわかりません。以下のコードを見つけてください。

_count = 0 for row_index, row in df.iterrows(): if re.search(r"\r", str(row)): print type(row) #Return type is pandas.Series row.replace({r'\r': ''} , regex=True) print row count += 1 _

jezrael · Accepted Answer

別の解決策は使用です str.strip ：

df['29'] = df['29'].str.strip(r'\r') print df id 29 0 location Uttar Pradesh 1 country_name India 2 total_deaths 20

replace を使用する場合は、rと1つの\を追加します。

print df.replace({r'\r': ''}, regex=True) id 29 0 location Uttar Pradesh 1 country_name India 2 total_deaths 20

replaceで、次のように置き換える列を定義できます。

print df id 29 0 location Uttar Pradesh
 1 country_name India 2 total_deaths
 20 print df.replace({'29': {r'\r': ''}}, regex=True) id 29 0 location Uttar Pradesh 1 country_name India 2 total_deaths
 20 print df.replace({r'\r': ''}, regex=True) id 29 0 location Uttar Pradesh 1 country_name India 2 total_deaths 20

コメントによる編集：

import pandas as pd df = pd.read_csv('data_source_test.csv') print df id country_name location total_deaths 0 1 India New Delhi 354 1 2 India Tamil Nadu 48 2 3 India Karnataka 0 3 4 India Andra Pradesh 32 4 5 India Assam 679 5 6 India Kerala 128 6 7 India Punjab 0 7 8 India Mumbai, Thane 1 8 9 India Uttar Pradesh
 20 9 10 India Orissa 69 print df.replace({r'
': ''}, regex=True) id country_name location total_deaths 0 1 India New Delhi 354 1 2 India Tamil Nadu 48 2 3 India Karnataka 0 3 4 India Andra Pradesh 32 4 5 India Assam 679 5 6 India Kerala 128 6 7 India Punjab 0 7 8 India Mumbai, Thane 1 8 9 India Uttar Pradesh 20 9 10 India Orissa 69

列locationのみを置き換える必要がある場合：

df['location'] = df.location.str.replace(r'
', '') print df id country_name location total_deaths 0 1 India New Delhi 354 1 2 India Tamil Nadu 48 2 3 India Karnataka 0 3 4 India Andra Pradesh 32 4 5 India Assam 679 5 6 India Kerala 128 6 7 India Punjab 0 7 8 India Mumbai, Thane 1 8 9 India Uttar Pradesh 20 9 10 India Orissa 69

EdChum · Answer

str.replace を使用します。シーケンスをエスケープして、リテラルではなくキャリッジリターンとして扱うようにする必要があります。

In [15]: df['29'] = df['29'].str.replace(r'\r','') df Out[15]: id 29 0 location Uttar Pradesh 1 country_name India 2 total_deaths 20

Gwen Au · Answer

以下のコードは、\ nタブスペース、\ n改行、および\ rキャリッジリターンを削除し、データを1行に凝縮するのに最適です。答えは https://Gist.github.com/smram/d6ded3c9028272360eb65bcab564a18a から取得されました

df.replace(to_replace=[r"\t|\n|\r", "	|
|
"], value=["",""], regex=True, inplace=<INPLACE>)