sklearnパイプラインからの機能名：適合しないエラー

Question

私はテキスト分類実験でscikit learnを使用しています。次に、最もパフォーマンスが高く、選択された機能の名前を取得します。同様の質問に対する回答をいくつか試しましたが、何も機能しません。コードの最後の行は、私が試した例です。たとえば、feature_names、私はこのエラーを受け取ります：sklearn.exceptions.NotFittedError: This SelectKBest instance is not fitted yet. Call 'fit' with appropriate arguments before using this method.解決策はありますか？

scaler = StandardScaler(with_mean=False) enc = LabelEncoder() y = enc.fit_transform(labels) feat_sel = SelectKBest(mutual_info_classif, k=200) clf = linear_model.LogisticRegression() pipe = Pipeline([('vectorizer', DictVectorizer()), ('scaler', StandardScaler(with_mean=False)), ('mutual_info', feat_sel), ('logistregress', clf)]) feature_names = pipe.named_steps['mutual_info'] X.columns[features.transform(np.arange(len(X.columns)))]

makis · Accepted Answer

最初にパイプラインに適合し、次に_feature_names_を呼び出す必要があります。

ソリューション

_scaler = StandardScaler(with_mean=False) enc = LabelEncoder() y = enc.fit_transform(labels) feat_sel = SelectKBest(mutual_info_classif, k=200) clf = linear_model.LogisticRegression() pipe = Pipeline([('vectorizer', DictVectorizer()), ('scaler', StandardScaler(with_mean=False)), ('mutual_info', feat_sel), ('logistregress', clf)]) # Now fit the pipeline using your data pipe.fit(X, y) #now can the pipe.named_steps feature_names = pipe.named_steps['mutual_info'] X.columns[features.transform(np.arange(len(X.columns)))] _

一般情報

ドキュメントからここの例を見ることができます

_anova_svm.set_params(anova__k=10, svc__C=.1).fit(X, y) _

これにより、いくつかの初期パラメーターが設定されます（anovaではkパラメーター、svcではCパラメーター）。

次に、fit(X,y)を呼び出してパイプラインに適合させます。

[〜＃〜]編集[〜＃〜]：

新しいエラーの場合、Xは辞書のリストなので、必要な列メソッドを呼び出す1つの方法がわかります。これはパンダを使用して行うことができます。

_X= [{'age': 10, 'name': 'Tom'}, {'age': 5, 'name': 'Mark'}] df = DataFrame(X) len(df.columns) _

結果：

_2 _

お役に立てれば