sklearnのSVM分類器に最も寄与する機能を決定する

Question

データセットがあり、そのデータでモデルをトレーニングしたい。トレーニング後、SVM分類器の分類で主要な貢献者となる機能を知る必要があります。

フォレストアルゴリズムには機能の重要性と呼ばれるものがありますが、同様のものはありますか？

Jakub Macina · Accepted Answer

はい、SVM分類子には属性coef_がありますが、線形カーネルを使用するSVMでのみ機能します。他のカーネルの場合、データはカーネルメソッドによって別のスペースに変換されるため、入力スペースに関連しないため、不可能です。説明を確認してください。

from matplotlib import pyplot as plt from sklearn import svm def f_importances(coef, names): imp = coef imp,names = Zip(*sorted(Zip(imp,names))) plt.barh(range(len(names)), imp, align='center') plt.yticks(range(len(names)), names) plt.show() features_names = ['input1', 'input2'] svm = svm.SVC(kernel='linear') svm.fit(X, Y) f_importances(svm.coef_, features_names)

そして、関数の出力は次のようになります。

Christoph Schmidl · Answer

Python 3でも動作し、Jakub Macinaのコードスニペットに基づいたソリューションを作成しました。

from matplotlib import pyplot as plt from sklearn import svm def f_importances(coef, names, top=-1): imp = coef imp, names = Zip(*sorted(list(Zip(imp, names)))) # Show all features if top == -1: top = len(names) plt.barh(range(top), imp[::-1][0:top], align='center') plt.yticks(range(top), names[::-1][0:top]) plt.show() # whatever your features are called features_names = ['input1', 'input2', ...] svm = svm.SVC(kernel='linear') svm.fit(X_train, y_train) # Specify your top n features you want to visualize. # You can also discard the abs() function # if you are interested in negative contribution of features f_importances(abs(clf.coef_[0]), feature_names, top=10)

Dor · Answer

1行のコードのみ：

sVMモデルに適合：

from sklearn import svm svm = svm.SVC(gamma=0.001, C=100., kernel = 'linear')

次のようにプロットを実装します。

pd.Series(abs(svm.coef_[0]), index=features.columns).nlargest(10).plot(kind='barh')

結果は次のようになります。

絶対値でのSVMモデルの最も寄与する機能