pandas Seriesの分位関数の逆は何ですか？

Question

変位値関数は、与えられたpandas series s、

例えば。

s.quantile（0.9）は4.2です

次のような値xを見つける逆関数（つまり、累積分布）はありますか？

s.quantile（x）= 4

ありがとう

fernandosjp · Accepted Answer

あなたと同じ質問がありました！私はscipyを使用して分位数の逆を取得する簡単な方法を見つけました。

#libs required from scipy import stats import pandas as pd import numpy as np #generate ramdom data with same seed (to be reproducible) np.random.seed(seed=1) df = pd.DataFrame(np.random.uniform(0,1,(10)), columns=['a']) #quantile function x = df.quantile(0.5)[0] #inverse of quantile stats.percentileofscore(df['a'],x)

ILoveCoding · Answer

並べ替えはコストがかかる可能性があります。単一の値を探す場合は、次の方法で計算する方が良いでしょう。

s = pd.Series(np.random.uniform(size=1000)) ( s < 0.7 ).astype(int).mean() # =0.7ish

おそらくint（bool）シェニガンを回避する方法があるでしょう。

Mike · Answer

私が知っている1ライナーはありませんが、scipyでこれを実現できます。

import pandas as pd import numpy as np from scipy.interpolate import interp1d # set up a sample dataframe df = pd.DataFrame(np.random.uniform(0,1,(11)), columns=['a']) # sort it by the desired series and caculate the percentile sdf = df.sort('a').reset_index() sdf['b'] = sdf.index / float(len(sdf) - 1) # setup the interpolator using the value as the index interp = interp1d(sdf['a'], sdf['b']) # a is the value, b is the percentile >>> sdf index a b 0 10 0.030469 0.0 1 3 0.144445 0.1 2 4 0.304763 0.2 3 1 0.359589 0.3 4 7 0.385524 0.4 5 5 0.538959 0.5 6 8 0.642845 0.6 7 6 0.667710 0.7 8 9 0.733504 0.8 9 2 0.905646 0.9 10 0 0.961936 1.0

これで、2つの関数が互いに逆であることを確認できます。

>>> df['a'].quantile(0.57) 0.61167933268395969 >>> interp(0.61167933268395969) array(0.57) >>> interp(df['a'].quantile(0.43)) array(0.43)

interpは、リスト、numpy配列、またはpandasデータ系列、任意のイテレータを実際に取り込むこともできます！

Calvin Ku · Answer

ちょうど同じ問題に遭遇しました。これが私の2セントです。

def inverse_percentile(arr, num): arr = sorted(arr) i_arr = [i for i, x in enumerate(arr) if x > num] return i_arr[0] / len(arr) if len(i_arr) > 0 else 1

Anastasiya-Romanova 秀 · Answer

数学的に言えば、あなたは [〜＃〜] cdf [〜＃〜] を見つけようとしている、またはsが以下の値または分位以下である確率を返しているq：

F(q) = Pr[s <= q]

Numpyを使用して、次の1行のコードを試すことができます。

np.mean(s.to_numpy() <= q)