TypeError：1つの要素を持つ整数配列のみがインデックスに変換できます

Question

クロス検証を使用して再帰的な機能選択を実行すると、次のエラーが表示されます。

Traceback (most recent call last): File "/Users/.../srl/main.py", line 32, in <module> argident_sys.train_classifier() File "/Users/.../srl/identification.py", line 194, in train_classifier feat_selector.fit(train_argcands_feats,train_argcands_target) File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/sklearn/feature_selection/rfe.py", line 298, in fit ranking_ = rfe.fit(X[train], y[train]).ranking_ TypeError: only integer arrays with one element can be converted to an index

エラーを生成するコードは次のとおりです。

def train_classifier(self): # Get the argument candidates argcands = self.get_argcands(self.reader) # Extract the necessary features from the argument candidates train_argcands_feats = [] train_argcands_target = [] for argcand in argcands: train_argcands_feats.append(self.extract_features(argcand)) if argcand["info"]["label"] == "NULL": train_argcands_target.append("NULL") else: train_argcands_target.append("ARG") # Transform the features to the format required by the classifier self.feat_vectorizer = DictVectorizer() train_argcands_feats = self.feat_vectorizer.fit_transform(train_argcands_feats) # Transform the target labels to the format required by the classifier self.target_names = list(set(train_argcands_target)) train_argcands_target = [self.target_names.index(target) for target in train_argcands_target] ## Train the appropriate supervised model # Recursive Feature Elimination self.classifier = LogisticRegression() feat_selector = RFECV(estimator=self.classifier, step=1, cv=StratifiedKFold(train_argcands_target, 10)) feat_selector.fit(train_argcands_feats,train_argcands_target) print feat_selector.n_features_ print feat_selector.support_ print feat_selector.ranking_ print feat_selector.cv_scores_ return

LogisticRegression分類子のパラメーターに対してもGridSearchを実行する必要があることは知っていますが、それがエラーの原因だとは思いません（またはそうではありませんか？）。

約50の機能でテストしていることに言及する必要がありますが、それらのほとんどすべてがカテゴリに分類されます（そのため、DictVectorizerを使用して適切に変換します）。

あなたが私に与えることができる助けやガイダンスは大歓迎です。ありがとう！

[〜＃〜] edit [〜＃〜]

トレーニングデータの例を次に示します。

train_argcands_feats = [{'head_lemma': u'Bras\xedlia', 'head': u'Bras\xedlia', 'head_postag': u'PROP'}, {'head_lemma': u'Pesquisa_Datafolha', 'head': u'Pesquisa_Datafolha', 'head_postag': u'N'}, {'head_lemma': u'dado', 'head': u'dado', 'head_postag': u'N'}, {'head_lemma': u'postura', 'head': u'postura', 'head_postag': u'N'}, {'head_lemma': u'maioria', 'head': u'maioria', 'head_postag': u'N'}, {'head_lemma': u'querer', 'head': u'quer', 'head_postag': u'V-FIN'}, {'head_lemma': u'PT', 'head': u'PT', 'head_postag': u'PROP'}, {'head_lemma': u'participar', 'head': u'participando', 'head_postag': u'V-GER'}, {'head_lemma': u'surpreendente', 'head': u'supreendente', 'head_postag': u'ADJ'}, {'head_lemma': u'Bras\xedlia', 'head': u'Bras\xedlia', 'head_postag': u'PROP'}, {'head_lemma': u'Pesquisa_Datafolha', 'head': u'Pesquisa_Datafolha', 'head_postag': u'N'}, {'head_lemma': u'revelar', 'head': u'revela', 'head_postag': u'V-FIN'}, {'head_lemma': u'recusar', 'head': u'recusando', 'head_postag': u'V-GER'}, {'head_lemma': u'maioria', 'head': u'maioria', 'head_postag': u'N'}, {'head_lemma': u'PT', 'head': u'PT', 'head_postag': u'PROP'}, {'head_lemma': u'participar', 'head': u'participando', 'head_postag': u'V-GER'}, {'head_lemma': u'surpreendente', 'head': u'supreendente', 'head_postag': u'ADJ'}, {'head_lemma': u'Bras\xedlia', 'head': u'Bras\xedlia', 'head_postag': u'PROP'}, {'head_lemma': u'Pesquisa_Datafolha', 'head': u'Pesquisa_Datafolha', 'head_postag': u'N'}, {'head_lemma': u'revelar', 'head': u'revela', 'head_postag': u'V-FIN'}, {'head_lemma': u'governo', 'head': u'Governo', 'head_postag': u'N'}, {'head_lemma': u'de', 'head': u'de', 'head_postag': u'PRP'}, {'head_lemma': u'governo', 'head': u'Governo', 'head_postag': u'N'}, {'head_lemma': u'recusar', 'head': u'recusando', 'head_postag': u'V-GER'}, {'head_lemma': u'maioria', 'head': u'maioria', 'head_postag': u'N'}, {'head_lemma': u'querer', 'head': u'quer', 'head_postag': u'V-FIN'}, {'head_lemma': u'PT', 'head': u'PT', 'head_postag': u'PROP'}, {'head_lemma': u'surpreendente', 'head': u'supreendente', 'head_postag': u'ADJ'}, {'head_lemma': u'Bras\xedlia', 'head': u'Bras\xedlia', 'head_postag': u'PROP'}, {'head_lemma': u'Pesquisa_Datafolha', 'head': u'Pesquisa_Datafolha', 'head_postag': u'N'}, {'head_lemma': u'revelar', 'head': u'revela', 'head_postag': u'V-FIN'}, {'head_lemma': u'muito', 'head': u'Muitas', 'head_postag': u'PRON-DET'}, {'head_lemma': u'prioridade', 'head': u'prioridades', 'head_postag': u'N'}, {'head_lemma': u'com', 'head': u'com', 'head_postag': u'PRP'}, {'head_lemma': u'prioridade', 'head': u'prioridades', 'head_postag': u'N'}] train_argcands_target = ['NULL', 'ARG', 'ARG', 'ARG', 'NULL', 'NULL', 'NULL', 'NULL', 'NULL', 'NULL', 'NULL', 'NULL', 'ARG', 'ARG', 'ARG', 'ARG', 'NULL', 'NULL', 'NULL', 'NULL', 'ARG', 'NULL', 'NULL', 'NULL', 'NULL', 'NULL', 'ARG', 'NULL', 'NULL', 'NULL', 'NULL', 'ARG', 'ARG', 'NULL', 'NULL']

feralvam · Accepted Answer

私はついに問題を解決しました。 2つのことを行う必要がありました。

train_argcands_targetはリストであり、numpy配列でなければなりません。見積もりツールを直接使用したとき、以前はうまく機能していたことに驚きました。
何らかの理由で（理由はまだわかりません）、DictVectorizerで作成されたスパース行列を使用しても機能しません。「手動で」各機能辞書を、各機能値を表す整数だけの機能配列に変換する必要がありました。変換プロセスは、ターゲット値のコードに示したものと似ています。

助けようとしたすべての人に感謝します！

user926321 · Answer

まだ興味がある人は、

CountVectorizerを非常によく似たものに使用すると、同じエラーが発生しました。ベクトライザーは、基本的に座標リストであるCOOスパース行列を提供することに気付きました。 COOマトリックスの要素には、行インデックスを介してアクセスできません。行ごとにインデックスを付けるCSRマトリックス（圧縮スパース行）に変換するのが最適です。変換は簡単に行うことができますcoo_matrix.tocsr()。他の変更は必要ありません、これは私のために働いた。