パンダのデータフレームシリーズを月名で並べ替えますか？

Question

私が持っているシリーズオブジェクトがあります：

_ date price dec 12 may 15 apr 13 .. _

問題文：月ごとに表示し、各月の平均価格を計算し、月ごとに並べ替えて表示したい。

望ましい出力：

_ month mean_price Jan XXX Feb XXX Mar XXX _

リストを作成してソート関数に渡すことを考えました：

_months = ["Jan", "Feb", "Mar", "Apr", "May", "Jun", "Jul", "Aug", "Sep", "Oct", "Nov", "Dec"] _

ただし、sort_valuesはシリーズではサポートしていません。

私が持っている1つの大きな問題は

df = df.sort_values(by='date',ascending=True,inplace=True)は最初のdfに対して機能しますが、groupbyを実行した後、ソートされたdfから出てくる順序を維持しませんでした。

結論として、最初のデータフレームからこれら2つの列が必要でした。 datetime列をソートし、月（dt.strftime（ '％B'））を使用してgroupbyを実行すると、ソートが乱れました。今、月の名前でソートする必要があります。

私のコード：

_df # has 5 columns though I need the column 'date' and 'price' df.sort_values(by='date',inplace=True) #at this part it is sorted according to date, great total=(df.groupby(df['date'].dt.strftime('%B'))['price'].mean()) # Though now it is not as it was but instead the months appear alphabetically _

Tai · Accepted Answer

文字列をより早く大文字にする方法を提供してくれた@Brad Solomonに感謝します！

注1 _pd.categorical_ を使用した@Brad Solomonの回答は、私の答えよりもリソースを節約するはずです。彼は、カテゴリデータに順序を割り当てる方法を示しました。あなたはそれを見逃してはいけません：P

または、使用することもできます。

_df = pd.DataFrame([["dec", 12], ["jan", 40], ["mar", 11], ["aug", 21], ["aug", 11], ["jan", 11], ["jan", 1]], columns=["Month", "Price"]) # Preprocessing: capitalize `jan`, `dec` to `Jan` and `Dec` df["Month"] = df["Month"].str.capitalize() # Now the dataset should look like # Month Price # ----------- # Dec XX # Jan XX # Apr XX # make it a datetime so that we can sort it: # use %b because the data use the abbriviation of month df["Month"] = pd.to_datetime(df.Month, format='%b', errors='coerce').dt.month df = df.sort_values(by="Month") total = (df.groupby(df['Month"])['Price'].mean()) # total Month 1 17.333333 3 11.000000 8 16.000000 12 12.000000 _

注2groupbyは、デフォルトでグループキーをソートします。 df = df.sort_values(by=SAME_KEY)とtotal = (df.groupby(df[SAME_KEY])['Price'].mean()).で同じキーを使用してソートおよびグループ化することに注意してください。そうしないと、意図しない動作が発生する可能性があります。 Groupbyグループ間の順序を維持するを参照してください。どの方法で？詳細については。

注3より計算効率の良い方法は、最初に平均を計算し、次に月でソートすることです。このように、df全体ではなく、12個のアイテムのみでソートする必要があります。 dfをソートする必要がない場合、計算コストが削減されます。

注4既にmonthをインデックスとして持っている人のために 、どのようにカテゴリー化するのか疑問に思う pandasを見てください。CategoricalIndex @jezraelには、カテゴリー索引をで順序付けする実例があります。月間インデックスでソートされたパンダシリーズ

Brad Solomon · Answer

カテゴリデータを使用して、適切な並べ替えを有効にできます。

months = ["Jan", "Feb", "Mar", "Apr", "May", "Jun", "Jul", "Aug", "Sep", "Oct", "Nov", "Dec"] df['months'] = pd.Categorical(df['months'], categories=months, ordered=True) df.sort_values(...) # same as you have now; can use inplace=True

カテゴリを指定すると、pandasは指定の順序をデフォルトのソート順序として記憶します。

ドキュメント：Pandas Categories> sorting＆order 。

anky_91 · Answer

calenderモジュールとreindexを使用します。

_series.str.capitalize_ シリーズの大文字化に役立ちます。その後、calenderモジュールで辞書を作成し、 map で取得するシリーズで辞書を作成します月番号。

月番号を取得したら、 sort_values() でインデックスを取得できます。次に reindex 。

_import calendar df.date=df.date.str.capitalize() #capitalizes the series d={i:e for e,i in enumerate(calendar.month_abbr)} #creates a dictionary #d={i[:3]:e for e,i in enumerate(calendar.month_name)} df.reindex(df.date.map(d).sort_values().index) #map + sort_values + reindex with index _

_ date price 2 Apr 13 1 May 15 0 Dec 12 _

Abhay S · Answer

軸0（インデックス）に基づいてインデックスの再作成を検討する必要があります

new_order = ['January', 'February', 'March', 'April', 'May', 'June', 'July', 'August', 'September', 'October', 'November', 'December'] df1 = df.reindex(new_order, axis=0)

Dinesh Babu · Answer

Sort_Dataframeby_Month関数を使用して、月名を時系列順にソートします

パッケージをインストールする必要があります。

$ pip install sorted-months-weekdays $ pip install sort-dataframeby-monthorweek

例：

from sorted_months_weekdays import * from sort_dataframeby_monthorweek import * df = pd.DataFrame([['Jan',23],['Jan',16],['Dec',35],['Apr',79],['Mar',53],['Mar',12],['Feb',3]], columns=['Month','Sum']) df Out[11]: Month Sum 0 Jan 23 1 Jan 16 2 Dec 35 3 Apr 79 4 Mar 53 5 Mar 12 6 Feb 3

月でデータフレームをソートするには、以下の関数を使用します

Sort_Dataframeby_Month(df=df,monthcolumnname='Month') Out[14]: Month Sum 0 Jan 23 1 Jan 16 2 Feb 3 3 Mar 53 4 Mar 12 5 Apr 79 6 Dec 35

Zellint · Answer

数値の月の値をインデックス内の名前（「01 January」など）とともに追加し、並べ替えを行ってから数値を取り除くことができます。

total=(df.groupby(df['date'].dt.strftime('%m %B'))['price'].mean()).sort_index()

次のようになります。

01 January xxx 02 February yyy 03 March zzz 04 April ttt total.index = [ x.split()[1] for x in total.index ] January xxx February yyy March zzz April ttt