ガウス関数をあてはめる

Question

ヒストグラムがあり（以下を参照）、ヒストグラムに曲線を当てはめるコードとともに平均と標準偏差を見つけようとしています。 SciPyやmatplotlibには何か役立つものがあると思いますが、私が試したすべての例はうまくいきません。

import matplotlib.pyplot as plt import numpy as np with open('gau_b_g_s.csv') as f: v = np.loadtxt(f, delimiter= ',', dtype="float", skiprows=1, usecols=None) fig, ax = plt.subplots() plt.hist(v, bins=500, color='#7F38EC', histtype='step') plt.title("Gaussian") plt.axis([-1, 2, 0, 20000]) plt.show()

Chris · Accepted Answer

この回答を見て、任意の曲線をデータに適合させてください。基本的に scipy.optimize.curve_fit 必要な関数をデータに合わせます。以下のコードは、ガウス分布をランダムデータに適合させる方法を示しています（ this SciPy-Userメーリングリストの投稿に対するクレジット）。

import numpy from scipy.optimize import curve_fit import matplotlib.pyplot as plt # Define some test data which is close to Gaussian data = numpy.random.normal(size=10000) hist, bin_edges = numpy.histogram(data, density=True) bin_centres = (bin_edges[:-1] + bin_edges[1:])/2 # Define model function to be used to fit to the data above: def gauss(x, *p): A, mu, sigma = p return A*numpy.exp(-(x-mu)**2/(2.*sigma**2)) # p0 is the initial guess for the fitting coefficients (A, mu and sigma above) p0 = [1., 0., 1.] coeff, var_matrix = curve_fit(gauss, bin_centres, hist, p0=p0) # Get the fitted curve hist_fit = gauss(bin_centres, *coeff) plt.plot(bin_centres, hist, label='Test data') plt.plot(bin_centres, hist_fit, label='Fitted data') # Finally, lets get the fitting parameters, i.e. the mean and standard deviation: print 'Fitted mean = ', coeff[1] print 'Fitted standard deviation = ', coeff[2] plt.show()

Nicolas Barbey · Answer

Sklearnガウス混合モデルの推定は、次のように試してみることができます。

import numpy as np import sklearn.mixture gmm = sklearn.mixture.GMM() # sample data a = np.random.randn(1000) # result r = gmm.fit(a[:, np.newaxis]) # GMM requires 2D data as of sklearn version 0.16 print("mean : %f, var : %f" % (r.means_[0, 0], r.covars_[0, 0]))

リファレンス： http://scikit-learn.org/stable/modules/mixture.html#mixture

この方法では、ヒストグラムを使用してサンプル分布を推定する必要がないことに注意してください。

misterte · Answer

古い質問のようなものですが、シリーズの密度近似をプロットするだけの人は、matplotlibの.plot(kind='kde')を試すことができます。ドキュメントここ。

パンダの例：

mydf.x.plot(kind='kde')

Akavall · Answer

入力が何であるかはわかりませんが、Y軸のスケールが大きすぎる（20000）ので、この数を減らしてみてください。次のコードは私のために機能します：

_import matplotlib.pyplot as plt import numpy as np #created my variable v = np.random.normal(0,1,1000) fig, ax = plt.subplots() plt.hist(v, bins=500, normed=1, color='#7F38EC', histtype='step') #plot plt.title("Gaussian") plt.axis([-1, 2, 0, 1]) #changed 20000 to 1 plt.show() _

編集：

Y軸の実際の値の数が必要な場合は、_normed=0_を設定できます。 plt.axis([-1, 2, 0, 1])を取り除くだけです。

_import matplotlib.pyplot as plt import numpy as np #function v = np.random.normal(0,1,500000) fig, ax = plt.subplots() # changed normed=1 to normed=0 plt.hist(v, bins=500, normed=0, color='#7F38EC', histtype='step') #plot plt.title("Gaussian") #plt.axis([-1, 2, 0, 20000]) plt.show() _