Python、Pandas＆独立のカイ2乗検定

Question

私はPythonおよび統計に不慣れです。以前の成功が人の変化のレベルに影響を与えるかどうかを判断するためにカイ二乗検定を適用しようとしています（パーセンテージ、これは実際にはそうでしたが、私の結果が統計的に有意であるかどうかを確認したかったのです）。

私の質問は、これを正しく行いましたか？私の結果は、p値が0.0であることを示しています。これは、変数間に有意な関係があることを意味します（もちろん、これは私が望んでいることです...しかし、0はp値には少し完璧すぎるようです。私が間違ってコーディングしたかどうか疑問に思っています）。

これが私がしたことです：

import numpy as np import pandas as pd import scipy.stats as stats d = {'Previously Successful' : pd.Series([129.3, 182.7, 312], index=['Yes - changed strategy', 'No', 'col_totals']), 'Previously Unsuccessful' : pd.Series([260.17, 711.83, 972], index=['Yes - changed strategy', 'No', 'col_totals']), 'row_totals' : pd.Series([(129.3+260.17), (182.7+711.83), (312+972)], index=['Yes - changed strategy', 'No', 'col_totals'])} total_summarized = pd.DataFrame(d) observed = total_summarized.ix[0:2,0:2]

出力：観測

expected = np.outer(total_summarized["row_totals"][0:2], total_summarized.ix["col_totals"][0:2])/1000 expected = pd.DataFrame(expected) expected.columns = ["Previously Successful","Previously Unsuccessful"] expected.index = ["Yes - changed strategy","No"] chi_squared_stat = (((observed-expected)**2)/expected).sum().sum() print(chi_squared_stat) crit = stats.chi2.ppf(q = 0.95, # Find the critical value for 95% confidence* df = 8) # * print("Critical value") print(crit) p_value = 1 - stats.chi2.cdf(x=chi_squared_stat, # Find the p-value df=8) print("P value") print(p_value) stats.chi2_contingency(observed= observed)

出力統計

Warren Weckesser · Accepted Answer

いくつかの修正：

expected配列が正しくありません。 1000ではなく1284であるobserved.sum().sum()で除算する必要があります。
このような2x2分割表の場合、自由度は8ではなく1です。
chi_squared_statの計算には連続性補正は含まれません。（しかし、それを使用しないことは必ずしも間違っているわけではありません。それは統計学者の判断の呼びかけです。）

実行するすべての計算（期待される行列、統計、自由度、p値）は chi2_contingency によって計算されます。

In [65]: observed Out[65]: Previously Successful Previously Unsuccessful Yes - changed strategy 129.3 260.17 No 182.7 711.83 In [66]: from scipy.stats import chi2_contingency In [67]: chi2, p, dof, expected = chi2_contingency(observed) In [68]: chi2 Out[68]: 23.383138325890453 In [69]: p Out[69]: 1.3273696199438626e-06 In [70]: dof Out[70]: 1 In [71]: expected Out[71]: array([[ 94.63757009, 294.83242991], [ 217.36242991, 677.16757009]])

デフォルトでは、chi2_contingencyは、分割表が2x2の場合に連続性補正を使用します。修正を使用しない場合は、引数correction=Falseを使用して無効にすることができます。

In [73]: chi2, p, dof, expected = chi2_contingency(observed, correction=False) In [74]: chi2 Out[74]: 24.072616672232893 In [75]: p Out[75]: 9.2770200776879643e-07