Pandasを使用して、データフレーム全体を小文字から大文字に変換します

Question

以下に示すようなデータフレームがあります。

# Create an example dataframe about a fictional army raw_data = {'regiment': ['Nighthawks', 'Nighthawks', 'Nighthawks', 'Nighthawks'], 'company': ['1st', '1st', '2nd', '2nd'], 'deaths': ['kkk', 52, '25', 616], 'battles': [5, '42', 2, 2], 'size': ['l', 'll', 'l', 'm']} df = pd.DataFrame(raw_data, columns = ['regiment', 'company', 'deaths', 'battles', 'size'])

私の目標は、データフレーム内のすべての文字列を大文字に変換して、次のようにすることです。

注意：すべてのデータ型はオブジェクトであり、変更しないでください。出力にはすべてのオブジェクトが含まれている必要があります。すべての単一列を1つずつ変換するのを避けたいです...おそらくデータフレーム全体で一般的にそれをしたいと思います。

私がこれまで試したのはこれをすることですが、成功しませんでした

df.str.upper()

Nehal J Wani · Accepted Answer

astype（）は、各シリーズを dtype オブジェクト（文字列）にキャストし、変換されたシリーズで str（）メソッドを呼び出して、文字列を文字列で呼び出し、その上で関数 pper（）を呼び出します。この後、すべての列のdtypeがobjectに変更されることに注意してください。

In [17]: df Out[17]: regiment company deaths battles size 0 Nighthawks 1st kkk 5 l 1 Nighthawks 1st 52 42 ll 2 Nighthawks 2nd 25 2 l 3 Nighthawks 2nd 616 2 m In [18]: df.apply(lambda x: x.astype(str).str.upper()) Out[18]: regiment company deaths battles size 0 NIGHTHAWKS 1ST KKK 5 L 1 NIGHTHAWKS 1ST 52 42 LL 2 NIGHTHAWKS 2ND 25 2 L 3 NIGHTHAWKS 2ND 616 2 M

後で to_numeric（）を使用して、「バトル」列を再び数値に変換できます。

In [42]: df2 = df.apply(lambda x: x.astype(str).str.upper()) In [43]: df2['battles'] = pd.to_numeric(df2['battles']) In [44]: df2 Out[44]: regiment company deaths battles size 0 NIGHTHAWKS 1ST KKK 5 L 1 NIGHTHAWKS 1ST 52 42 LL 2 NIGHTHAWKS 2ND 25 2 L 3 NIGHTHAWKS 2ND 616 2 M In [45]: df2.dtypes Out[45]: regiment object company object deaths object battles int64 size object dtype: object

VincentQT · Answer

これは、次のapplymap操作によって解決できます。

df = df.applymap(lambda s:s.lower() if type(s) == str else s)

IanS · Answer

strはシリーズに対してのみ機能するため、各列に個別に適用してから連結できます。

In [6]: pd.concat([df[col].astype(str).str.upper() for col in df.columns], axis=1) Out[6]: regiment company deaths battles size 0 NIGHTHAWKS 1ST KKK 5 L 1 NIGHTHAWKS 1ST 52 42 LL 2 NIGHTHAWKS 2ND 25 2 L 3 NIGHTHAWKS 2ND 616 2 M

編集：パフォーマンス比較

In [10]: %timeit df.apply(lambda x: x.astype(str).str.upper()) 100 loops, best of 3: 3.32 ms per loop In [11]: %timeit pd.concat([df[col].astype(str).str.upper() for col in df.columns], axis=1) 100 loops, best of 3: 3.32 ms per loop

どちらの答えも、小さなデータフレームで同等に機能します。

In [15]: df = pd.concat(10000 * [df]) In [16]: %timeit pd.concat([df[col].astype(str).str.upper() for col in df.columns], axis=1) 10 loops, best of 3: 104 ms per loop In [17]: %timeit df.apply(lambda x: x.astype(str).str.upper()) 10 loops, best of 3: 130 ms per loop

大きなデータフレームでは、私の答えはわずかに速くなります。

Ganesh Bhat · Answer

これを試して

df2 = df2.apply(lambda x: x.str.upper() if x.dtype == "object" else x)

Alex Montoya · Answer

dtypeの使用を保存したい場合はisinstance(obj,type)です

df.apply(lambda x: x.str.upper().str.strip() if isinstance(x, object) else x)