Scikitでマルチクラス分類の混同行列をどのように計算しますか？

Question

マルチクラス分類タスクがあります。次のように scikitの例に基づいてスクリプトを実行すると、

_classifier = OneVsRestClassifier(GradientBoostingClassifier(n_estimators=70, max_depth=3, learning_rate=.02)) y_pred = classifier.fit(X_train, y_train).predict(X_test) cnf_matrix = confusion_matrix(y_test, y_pred) _

私はこのエラーを受け取ります：

_File "C:\ProgramData\Anaconda2\lib\site-packages\sklearn\metrics\classification.py", line 242, in confusion_matrix raise ValueError("%s is not supported" % y_type) ValueError: multilabel-indicator is not supported _

_labels=classifier.classes__をconfusion_matrix()に渡そうとしましたが、役に立ちません。

y_testとy_predは次のとおりです。

_y_test = array([[0, 0, 0, 1, 0, 0], [0, 0, 0, 0, 1, 0], [0, 1, 0, 0, 0, 0], ..., [0, 0, 0, 0, 0, 1], [0, 0, 0, 1, 0, 0], [0, 0, 0, 0, 1, 0]]) y_pred = array([[0, 0, 0, 0, 0, 0], [0, 0, 0, 0, 0, 0], [0, 0, 0, 0, 0, 0], ..., [0, 0, 0, 0, 0, 1], [0, 0, 0, 0, 0, 1], [0, 0, 0, 0, 0, 0]]) _

Naomi Fridman · Accepted Answer

まず、ラベル出力配列を作成する必要があります。 3つのクラスがあるとしましょう： 'cat'、 'dog'、 'house' indexed：0,1,2。 2つのサンプルの予測は、「犬」、「家」です。出力は次のようになります。

y_pred = [[0, 1, 0],[0, 0, 1]]

y_pred.argmax（1）を実行して取得：[1,2]この配列は、元のラベルインデックスを意味します。つまり、['dog'、 'house']

num_classes = 3 # from lable to categorial y_prediction = np.array([1,2]) y_categorial = np_utils.to_categorical(y_prediction, num_classes) # from categorial to lable indexing y_pred = y_categorial.argmax(1)

ak2205 · Answer

これは私のために働きました：

y_test_non_category = [ np.argmax(t) for t in y_test ] y_predict_non_category = [ np.argmax(t) for t in y_predict ] from sklearn.metrics import confusion_matrix conf_mat = confusion_matrix(y_test_non_category, y_predict_non_category)

どこ y_testおよびy_predictは、ワンホットベクトルのようなカテゴリ変数です。

mcchran · Answer

出力を差し引いただけですy_test予測からの行列y_predカテゴリカル形式を維持したままの行列。の場合には -1、1、誤検知。

次：

if output_matrix[i,j] == 1 and predictions_matrix[i,j] == 1: produced_matrix[i,j] = 2

次の表記で終わる：

-1：偽陰性
1：誤検知
0：真陰性
2：真陽性

最後に、いくつかの単純なカウントを実行して、混乱のメトリックを生成できます。