スコットとは誰ですか？ -SeabornペアプロットのValueError：文字列を浮動小数点に変換できませんでした： 'scott'

Question

スコットとは誰ですか？

問題

Seabornを使用してLoan PredictionデータセットからペアプロットにEducation属性を追加しようとすると、次のエラーが発生します。

ValueErrorトレースバック（最後の最新呼び出し）〜/ anaconda3/lib/python3.7/site-packages/statsmodels/nonparametric/kde.py in kdensityfft（X、kernel、bw、weights、gridsize、adjust、clip、cut、retgrid） 450試行：-> 451 bw = float（bw）452例外：

ValueError：文字列を浮動小数点に変換できませんでした： 'scott'

私は生データを調べましたが、どこにも「スコット」を見つけることができなかったので、私の質問は、これがどこから来て、どのようにそれを修正できるかです。

また、「RuntimeError：選択されたKDE帯域幅は0です。密度を推定できません。」という実行時エラーも発生します。これが最初のエラーが原因であるのか、それともまったく別の問題であるのかはわかりません。誰かがこれに光を当てることができれば私は感謝しています。

データセット

見つかったローン予測データセット here を使用しています。属性は次のとおりです。

 Loan_ID Gender Married Dependents Education Self_Employed ApplicantIncome CoapplicantIncome LoanAmount Loan_Amount_Term Credit_History Property_Area Loan_Status 0 LP001002 Male No 0 Graduate No 5849 0.0 NaN 360.0 1.0 Urban Y 1 LP001003 Male Yes 1 Graduate No 4583 1508.0 128.0 360.0 1.0 Rural N 2 LP001005 Male Yes 0 Graduate Yes 3000 0.0 66.0 360.0 1.0 Urban Y 3 LP001006 Male Yes 0 Not Graduate No 2583 2358.0 120.0 360.0 1.0 Urban Y 4 LP001008 Male No 0 Graduate No 6000 0.0 141.0 360.0 1.0 Urban Y

コード

import pandas as pd import matplotlib.pyplot as plt import seaborn as sns %matplotlib inline # I'm using ipython notebook train_data = pd.read_csv("train_ctrUa4K.csv") bad_credit = train_data[train_data["Credit_History"] == 0] bad_credit["Education"] = bad_credit["Education"].map({"Graduate":1,"Not Graduate":0}) sns.pairplot(bad_credit,vars=["ApplicantIncome","Education","LoanAmount"],hue="Loan_Status")

エラー

--------------------------------------------------------------------------- ValueError Traceback (most recent call last) ~/anaconda3/lib/python3.7/site-packages/statsmodels/nonparametric/kde.py in kdensityfft(X, kernel, bw, weights, gridsize, adjust, clip, cut, retgrid) 450 try: --> 451 bw = float(bw) 452 except: ValueError: could not convert string to float: 'scott' During handling of the above exception, another exception occurred: RuntimeError Traceback (most recent call last) <ipython-input-25-0cd48ab0d803> in <module> 2 bad_credit = train_data[train_data["Credit_History"] == 0] 3 bad_credit["Education"] = bad_credit["Education"].map({"Graduate":1,"Not Graduate":0}) ----> 4 sns.pairplot(bad_credit,vars=["ApplicantIncome","Education","LoanAmount"],hue="Loan_Status") ~/anaconda3/lib/python3.7/site-packages/seaborn/axisgrid.py in pairplot(data, hue, hue_order, palette, vars, x_vars, y_vars, kind, diag_kind, markers, height, aspect, corner, dropna, plot_kws, diag_kws, grid_kws, size) 2119 diag_kws.setdefault("shade", True) 2120 diag_kws["legend"] = False -> 2121 grid.map_diag(kdeplot, **diag_kws) 2122 2123 # Maybe plot on the off-diagonals ~/anaconda3/lib/python3.7/site-packages/seaborn/axisgrid.py in map_diag(self, func, **kwargs) 1488 data_k = utils.remove_na(data_k) 1489 -> 1490 func(data_k, label=label_k, color=color, **kwargs) 1491 1492 self._clean_axis(ax) ~/anaconda3/lib/python3.7/site-packages/seaborn/distributions.py in kdeplot(data, data2, shade, vertical, kernel, bw, gridsize, cut, clip, legend, cumulative, shade_lowest, cbar, cbar_ax, cbar_kws, ax, **kwargs) 703 ax = _univariate_kdeplot(data, shade, vertical, kernel, bw, 704 gridsize, cut, clip, legend, ax, --> 705 cumulative=cumulative, **kwargs) 706 707 return ax ~/anaconda3/lib/python3.7/site-packages/seaborn/distributions.py in _univariate_kdeplot(data, shade, vertical, kernel, bw, gridsize, cut, clip, legend, ax, cumulative, **kwargs) 293 x, y = _statsmodels_univariate_kde(data, kernel, bw, 294 gridsize, cut, clip, --> 295 cumulative=cumulative) 296 else: 297 # Fall back to scipy if missing statsmodels ~/anaconda3/lib/python3.7/site-packages/seaborn/distributions.py in _statsmodels_univariate_kde(data, kernel, bw, gridsize, cut, clip, cumulative) 365 fft = kernel == "gau" 366 kde = smnp.KDEUnivariate(data) --> 367 kde.fit(kernel, bw, fft, gridsize=gridsize, cut=cut, clip=clip) 368 if cumulative: 369 grid, y = kde.support, kde.cdf ~/anaconda3/lib/python3.7/site-packages/statsmodels/nonparametric/kde.py in fit(self, kernel, bw, fft, weights, gridsize, adjust, cut, clip) 138 density, grid, bw = kdensityfft(endog, kernel=kernel, bw=bw, 139 adjust=adjust, weights=weights, gridsize=gridsize, --> 140 clip=clip, cut=cut) 141 else: 142 density, grid, bw = kdensity(endog, kernel=kernel, bw=bw, ~/anaconda3/lib/python3.7/site-packages/statsmodels/nonparametric/kde.py in kdensityfft(X, kernel, bw, weights, gridsize, adjust, clip, cut, retgrid) 451 bw = float(bw) 452 except: --> 453 bw = bandwidths.select_bandwidth(X, bw, kern) # will cross-val fit this pattern? 454 bw *= adjust 455 ~/anaconda3/lib/python3.7/site-packages/statsmodels/nonparametric/bandwidths.py in select_bandwidth(x, bw, kernel) 172 # eventually this can fall back on another selection criterion. 173 err = "Selected KDE bandwidth is 0. Cannot estiamte density." --> 174 raise RuntimeError(err) 175 else: 176 return bandwidth RuntimeError: Selected KDE bandwidth is 0. Cannot estiamte density.

Diziet Asahi · Answer

scottは、カーネル密度推定（KDE）をプロットするときに帯域幅を選択する方法の名前です。 DWスコット（1）にちなんで名付けられました。

私はあなたのデータを見ることはできませんが、何かが特定の色相レベルの変数のペアの1つで奇妙なものであり、それによってseabornが適切な帯域幅を計算できないようになっていると思います。

_diag_kws_を使用して、引数を sns.kdeplot() に渡すことができます。これは、pairplotが対角線上の一変量分布をプロットするために使用します。

例えば：

_sns.pairplot(..., diag_kws={'bw':'silverman'}) _

sns.kdeplot()が「silverman」メソッドを使用して帯域幅を選択するように強制します。これは、あなたの場合、Scottメソッドよりもうまく機能する可能性がありますか？

（1）D.W.スコット、「多変量密度推定：理論、実践、視覚化」、ジョンワイリー＆サンズ、ニューヨーク、チスター、1992年。

[〜＃〜]編集[〜＃〜]

犯人を特定するには、pairplot()ではなくPairGridを使用する必要があります。 PairGridを使用すると、カスタム関数を使用して対角線をプロットできます。その関数にprintステートメントを含めると、sns.kdeplot（）に渡されるデータを確認できます。実行は、データが「正しくない」時点で停止するはずであり、それで何をすべきかを理解できる可能性があります。

例えば：

_def test_func(*data, **kwargs): print("data received:", data) print("hue name + other params:", kwargs) sns.kdeplot(*data, **kwargs) iris = sns.load_dataset('iris') g = sns.PairGrid(iris, hue="species") g = g.map_diag(test_func) _

変数（列）ごと、およびレベルごとに、次のような出力が得られます。

_data received: (array([5.1, 4.9, 4.7, 4.6, 5. , 5.4, 4.6, 5. , 4.4, 4.9, 5.4, 4.8, 4.8, 4.3, 5.8, 5.7, 5.4, 5.1, 5.7, 5.1, 5.4, 5.1, 4.6, 5.1, 4.8, 5. , 5. , 5.2, 5.2, 4.7, 4.8, 5.4, 5.2, 5.5, 4.9, 5. , 5.5, 4.9, 4.4, 5.1, 5. , 4.5, 4.4, 5. , 5.1, 4.8, 5.1, 4.6, 5.3, 5. ]),) hue name + other params: {'label': 'setosa', 'color': (0.12156862745098039, 0.4666666666666667, 0.7058823529411765)} data received: (array([7. , 6.4, 6.9, 5.5, 6.5, 5.7, 6.3, 4.9, 6.6, 5.2, 5. , 5.9, 6. , 6.1, 5.6, 6.7, 5.6, 5.8, 6.2, 5.6, 5.9, 6.1, 6.3, 6.1, 6.4, 6.6, 6.8, 6.7, 6. , 5.7, 5.5, 5.5, 5.8, 6. , 5.4, 6. , 6.7, 6.3, 5.6, 5.5, 5.5, 6.1, 5.8, 5. , 5.6, 5.7, 5.7, 6.2, 5.1, 5.7]),) hue name + other params: {'label': 'versicolor', 'color': (1.0, 0.4980392156862745, 0.054901960784313725)} (...) _