データフレームのすべての文字列をストリップ/トリム

Question

Python/pandasのマルチタイプデータフレームの値を消去し、文字列をトリムします。私は現在2つの指示でそれをやっています：

import pandas as pd df = pd.DataFrame([[' a ', 10], [' c ', 5]]) df.replace('^\s+', '', regex=True, inplace=True) #front df.replace('\s+$', '', regex=True, inplace=True) #end df.values

これは非常に遅いですが、何を改善できますか？

jezrael · Accepted Answer

DataFrame.select_dtypes を使用して、string列を選択してから、apply関数を選択できます str.strip 。

注意：typesはdictsであるため、値はlistsやdtypesのようなobjectにはできません。

df_obj = df.select_dtypes(['object']) print (df_obj) 0 a 1 c df[df_obj.columns] = df_obj.apply(lambda x: x.str.strip()) print (df) 0 1 0 a 10 1 c 5

ただし、列が数個しかない場合は str.strip を使用します。

df[0] = df[0].str.strip()

Jonathan B. · Answer

マネーショット

以下は、値が文字列型の場合にのみapplymapを呼び出すために、単純なラムダ式でstripを使用するコンパクトバージョンです。

df.applymap(lambda x: x.strip() if isinstance(x, str) else x)

完全な例

より完全な例：

import pandas as pd def trim_all_columns(df): """ Trim whitespace from ends of each value across all series in dataframe """ trim_strings = lambda x: x.strip() if isinstance(x, str) else x return df.applymap(trim_strings) # simple example of trimming whitespace from data elements df = pd.DataFrame([[' a ', 10], [' c ', 5]]) df = trim_all_columns(df) print(df) >>> 0 1 0 a 10 1 c 5

実施例

Trinketがホストする実際の例を次に示します。 https://trinket.io/python3/e6ab7fb4ab

Roman Pekar · Answer

本当に正規表現を使用したい場合は、

>>> df.replace('(^\s+|\s+$)', '', regex=True, inplace=True) >>> df 0 1 0 a 10 1 c 5

ただし、次のように実行する方が高速です。

>>> df[0] = df[0].str.strip()

Aakash Makwana · Answer

あなたが試すことができます：

df[0] = df[0].str.strip()

より具体的にはすべての文字列列に対して

non_numeric_columns = list(set(df.columns)-set(df._get_numeric_data().columns)) df[non_numeric_columns] = df[non_numeric_columns].apply(lambda x : str(x).strip())

Dekel · Answer

applyオブジェクトの Series function を使用できます。

>>> df = pd.DataFrame([[' a ', 10], [' c ', 5]]) >>> df[0][0] ' a ' >>> df[0] = df[0].apply(lambda x: x.strip()) >>> df[0][0] 'a'

stripではなく、regexの使用に注意してください。

別のオプション-DataFrameオブジェクトの apply function を使用します。

>>> df = pd.DataFrame([[' a ', 10], [' c ', 5]]) >>> df.apply(lambda x: x.apply(lambda y: y.strip() if type(y) == type('') else y), axis=0) 0 1 0 a 10 1 c 5

hyunwoo jeong · Answer

def trim(x): if x.dtype == object: x = x.str.split(' ').str[0] return(x) df = df.apply(trim)