scikitは出力metrics.classification_reportをCSV /タブ区切り形式に学習します

Question

Scikit-Learnでマルチクラステキスト分類を行っています。データセットは、数百のラベルを持つ多項単純ベイズ分類器を使用してトレーニングされています。以下は、MNBモデルを適合させるためのScikit Learnスクリプトからの抜粋です。

from __future__ import print_function # Read **`file.csv`** into a pandas DataFrame import pandas as pd path = 'data/file.csv' merged = pd.read_csv(path, error_bad_lines=False, low_memory=False) # define X and y using the original DataFrame X = merged.text y = merged.grid # split X and y into training and testing sets; from sklearn.cross_validation import train_test_split X_train, X_test, y_train, y_test = train_test_split(X, y, random_state=1) # import and instantiate CountVectorizer from sklearn.feature_extraction.text import CountVectorizer vect = CountVectorizer() # create document-term matrices using CountVectorizer X_train_dtm = vect.fit_transform(X_train) X_test_dtm = vect.transform(X_test) # import and instantiate MultinomialNB from sklearn.naive_bayes import MultinomialNB nb = MultinomialNB() # fit a Multinomial Naive Bayes model nb.fit(X_train_dtm, y_train) # make class predictions y_pred_class = nb.predict(X_test_dtm) # generate classification report from sklearn import metrics print(metrics.classification_report(y_test, y_pred_class))

また、コマンドライン画面のmetrics.classification_reportの簡略化された出力は次のようになります。

 precision recall f1-score support 12 0.84 0.48 0.61 2843 13 0.00 0.00 0.00 69 15 1.00 0.19 0.32 232 16 0.75 0.02 0.05 965 33 1.00 0.04 0.07 155 4 0.59 0.34 0.43 5600 41 0.63 0.49 0.55 6218 42 0.00 0.00 0.00 102 49 0.00 0.00 0.00 11 5 0.90 0.06 0.12 2010 50 0.00 0.00 0.00 5 51 0.96 0.07 0.13 1267 58 1.00 0.01 0.02 180 59 0.37 0.80 0.51 8127 7 0.91 0.05 0.10 579 8 0.50 0.56 0.53 7555 avg/total 0.59 0.48 0.45 35919

レポート出力を標準の列ヘッダーを持つ標準のcsvファイルに取得する方法があるかどうか疑問に思っていました

コマンドライン出力をcsvファイルに送信するか、画面出力をスプレッドシート（Openoffice CalcまたはExcel）にコピー/貼り付けしようとすると、結果が1列にまとめられます。このように見える：

感謝します。ありがとう！

janus235 · Answer

scikit-learn v0.20、分類レポートをpandasデータフレームに変換する最も簡単な方法は、レポートをdictとして返すことです。

report = classification_report(y_test, y_pred, output_dict=True)

次に、データフレームを構築して転置します。

df = pandas.DataFrame(report).transpose()

ここからは、標準のpandasメソッドを自由に使用して、目的の出力形式（CSV、HTML、LaTeXなど）を生成できます。

https://scikit-learn.org/0.20/modules/generated/sklearn.metrics.classification_report.html のドキュメントも参照してください

kindjacket · Answer

個々のスコアが必要な場合は、これで問題ありません。

import pandas as pd def classification_report_csv(report): report_data = [] lines = report.split('
') for line in lines[2:-3]: row = {} row_data = line.split(' ') row['class'] = row_data[0] row['precision'] = float(row_data[1]) row['recall'] = float(row_data[2]) row['f1_score'] = float(row_data[3]) row['support'] = float(row_data[4]) report_data.append(row) dataframe = pd.DataFrame.from_dict(report_data) dataframe.to_csv('classification_report.csv', index = False) report = classification_report(y_true, y_pred) classification_report_csv(report)

Pankaj Kabra · Answer

Precision_recall_fscore_support関数から実際の値を取得し、それらをデータフレームに入れることができます。以下のコードでも同じ結果が得られますが、pandas df :)になりました。

clf_rep = metrics.precision_recall_fscore_support(true, pred) out_dict = { "precision" :clf_rep[0].round(2) ,"recall" : clf_rep[1].round(2) ,"f1-score" : clf_rep[2].round(2) ,"support" : clf_rep[3] } out_df = pd.DataFrame(out_dict, index = nb.classes_) avg_tot = (out_df.apply(lambda x: round(x.mean(), 2) if x.name!="support" else round(x.sum(), 2)).to_frame().T) avg_tot.index = ["avg/total"] out_df = out_df.append(avg_tot) print out_df

Kam Sen · Answer

以前の回答はおそらくすべて機能していますが、少し冗長です。次の例では、個々のクラスの結果と要約行を単一のデータフレームに保存します。レポートの変更にあまり敏感ではありませんが、私にとってはトリックをしました。

#init snippet and fake data from io import StringIO import re import pandas as pd from sklearn import metrics true_label = [1,1,2,2,3,3] pred_label = [1,2,2,3,3,1] def report_to_df(report): report = re.sub(r" +", " ", report).replace("avg / total", "avg/total").replace("
 ", "
") report_df = pd.read_csv(StringIO("Classes" + report), sep=' ', index_col=0) return(report_df) #txt report to df report = metrics.classification_report(true_label, pred_label) report_df = report_to_df(report) #store, print, copy... print (report_df)

これにより、目的の出力が得られます。

Classes precision recall f1-score support 1 0.5 0.5 0.5 2 2 0.5 0.5 0.5 2 3 0.5 0.5 0.5 2 avg/total 0.5 0.5 0.5 6

Raul · Answer

ここの投稿の1つで述べたように、precision_recall_fscore_supportはclassification_reportに類似しています。

次に、python library pandasを使用して、classification_reportが行うことと同様に、列形式でデータを簡単にフォーマットするだけで十分です。以下に例を示します。

import numpy as np import pandas as pd from sklearn.metrics import classification_report from sklearn.metrics import precision_recall_fscore_support np.random.seed(0) y_true = np.array([0]*400 + [1]*600) y_pred = np.random.randint(2, size=1000) def pandas_classification_report(y_true, y_pred): metrics_summary = precision_recall_fscore_support( y_true=y_true, y_pred=y_pred) avg = list(precision_recall_fscore_support( y_true=y_true, y_pred=y_pred, average='weighted')) metrics_sum_index = ['precision', 'recall', 'f1-score', 'support'] class_report_df = pd.DataFrame( list(metrics_summary), index=metrics_sum_index) support = class_report_df.loc['support'] total = support.sum() avg[-1] = total class_report_df['avg / total'] = avg return class_report_df.T

classification_reportを使用すると、次のようになります。

print(classification_report(y_true=y_true, y_pred=y_pred, digits=6))

出力：

 precision recall f1-score support 0 0.379032 0.470000 0.419643 400 1 0.579365 0.486667 0.528986 600 avg / total 0.499232 0.480000 0.485248 1000

次に、カスタム機能pandas_classification_reportを使用します。

df_class_report = pandas_classification_report(y_true=y_true, y_pred=y_pred) print(df_class_report)

出力：

 precision recall f1-score support 0 0.379032 0.470000 0.419643 400.0 1 0.579365 0.486667 0.528986 600.0 avg / total 0.499232 0.480000 0.485248 1000.0

次に、それをcsv形式に保存します（sep = ';'のような他の区切り形式については、 here を参照してください）：

df_class_report.to_csv('my_csv_file.csv', sep=',')

LibreOffice Calcでmy_csv_file.csvを開きます（ただし、Excelなどの表形式/スプレッドシートエディターを使用できます）：

Samuel Nde · Answer

import pandas as pdだけで、output_dictの計算時に、デフォルトでFalseに設定されるclassification_reportパラメーターをTrueに設定していることを確認してください。これによりclassification_report dictionaryが生成され、pandas DataFrameメソッドに渡すことができます。結果のtransposeをDataFrameして、希望する出力形式に合わせることができます。結果のDataFrameは、必要に応じてcsvファイルに書き込まれます。

clsf_report = pd.DataFrame(classification_report(y_true = your_y_true, y_pred = your_y_preds5, output_dict=True)).transpose()
clsf_report.to_csv('Your Classification Report Name.csv', index= True)

これがお役に立てば幸いです。

clsf_report = pd.DataFrame(classification_report(y_true = your_y_true, y_pred = your_y_preds5, output_dict=True)).transpose() clsf_report.to_csv('Your Classification Report Name.csv', index= True)

これがお役に立てば幸いです。

Karel Macek · Answer

別のオプションは、基礎となるデータを計算し、独自にレポートを作成することです。あなたが得るすべての統計

precision_recall_fscore_support

elz · Answer

また、回答の一部は少し冗長です。他の人が示唆しているようにprecision_recall_fscore_supportを使用する私の3行のソリューションです。

import pandas as pd from sklearn.metrics import precision_recall_fscore_support report = pd.DataFrame(list(precision_recall_fscore_support(y_true, y_pred)), index=['Precision', 'Recall', 'F1-score', 'Support']).T # Now add the 'Avg/Total' row report.loc['Avg/Total', :] = precision_recall_fscore_support(y_true, y_test, average='weighted') report.loc['Avg/Total', 'Support'] = report['Support'].sum()

Yash Nag · Answer

分類レポートをdictとして出力することは明らかに良い考えです。

sklearn.metrics.classification_report(y_true, y_pred, output_dict=True)

しかし、ここですべてのクラス（クラスのみ）の結果をpandasデータフレームに変換するために作成した関数です。

def report_to_df(report): report = [x.split(' ') for x in report.split('
')] header = ['Class Name']+[x for x in report[0] if x!=''] values = [] for row in report[1:-5]: row = [value for value in row if value!=''] if row!=[]: values.append(row) df = pd.DataFrame(data = values, columns = header) return df

これがうまくいくことを願っています。

Surya · Answer

入出力の例とともに、他の関数はこちらmetrics_report_to_df（） Sklearnメトリックスからprecision_recall_fscore_supportを実装すると、次のようになります。

# Generates classification metrics using precision_recall_fscore_support: from sklearn import metrics import pandas as pd import numpy as np; from numpy import random # Simulating true and predicted labels as test dataset: np.random.seed(10) y_true = np.array([0]*300 + [1]*700) y_pred = np.random.randint(2, size=1000) # Here's the custom function returning classification report dataframe: def metrics_report_to_df(ytrue, ypred): precision, recall, fscore, support = metrics.precision_recall_fscore_support(ytrue, ypred) classification_report = pd.concat(map(pd.DataFrame, [precision, recall, fscore, support]), axis=1) classification_report.columns = ["precision", "recall", "f1-score", "support"] # Add row w "avg/total" classification_report.loc['avg/Total', :] = metrics.precision_recall_fscore_support(ytrue, ypred, average='weighted') classification_report.loc['avg/Total', 'support'] = classification_report['support'].sum() return(classification_report) # Provide input as true_label and predicted label (from classifier) classification_report = metrics_report_to_df(y_true, y_pred) # Here's the output (metrics report transformed to dataframe ) In [1047]: classification_report Out[1047]: precision recall f1-score support 0 0.300578 0.520000 0.380952 300.0 1 0.700624 0.481429 0.570703 700.0 avg/Total 0.580610 0.493000 0.513778 1000.0

Piotr Badura · Answer

これは2つのクラス（pos、neg）分類のための私のコードです

report = metrics.precision_recall_fscore_support(true_labels,predicted_labels,labels=classes) rowDicionary["precision_pos"] = report[0][0] rowDicionary["recall_pos"] = report[1][0] rowDicionary["f1-score_pos"] = report[2][0] rowDicionary["support_pos"] = report[3][0] rowDicionary["precision_neg"] = report[0][1] rowDicionary["recall_neg"] = report[1][1] rowDicionary["f1-score_neg"] = report[2][1] rowDicionary["support_neg"] = report[3][1] writer = csv.DictWriter(file, fieldnames=fieldnames) writer.writerow(rowDicionary)

Sipan17 · Answer

def to_table(report): report = report.splitlines() res = [] res.append(['']+report[0].split()) for row in report[2:-2]: res.append(row.split()) lr = report[-1].split() res.append([' '.join(lr[:3])]+lr[3:]) return np.array(res)

pandas dataframeに変換するか、csvファイルとして保存することができるnumpy配列を返します。

Kevin · Answer

@kindjacketの答えを修正しました。これを試して：

import collections def classification_report_df(report): report_data = [] lines = report.split('
') del lines[-5] del lines[-1] del lines[1] for line in lines[1:]: row = collections.OrderedDict() row_data = line.split() row_data = list(filter(None, row_data)) row['class'] = row_data[0] + " " + row_data[1] row['precision'] = float(row_data[2]) row['recall'] = float(row_data[3]) row['f1_score'] = float(row_data[4]) row['support'] = int(row_data[5]) report_data.append(row) df = pd.DataFrame.from_dict(report_data) df.set_index('class', inplace=True) return df

あなたはパンダを使用してそのdfをcsvにエクスポートすることができます

DeshDeep Singh · Answer

まだ解決策が必要かどうかはわかりませんが、完璧な形式に保ち、保存するためにこれが最善です。

def classifcation_report_processing(model_to_report): tmp = list() for row in model_to_report.split("
"): parsed_row = [x for x in row.split(" ") if len(x) > 0] if len(parsed_row) > 0: tmp.append(parsed_row) # Store in dictionary measures = tmp[0] D_class_data = defaultdict(dict) for row in tmp[1:]: class_label = row[0] for j, m in enumerate(measures): D_class_data[class_label][m.strip()] = float(row[j + 1].strip()) save_report = pd.DataFrame.from_dict(D_class_data).T path_to_save = os.getcwd() +'/Classification_report.xlsx' save_report.to_Excel(path_to_save, index=True) return save_report.head(5) saving_CL_report_naive_bayes = classifcation_report_processing(classification_report(y_val, prediction))

Nitin Kumar · Answer

私がやったのと同じ問題がありましたmetrics.classification_reportの文字列出力をGoogleシートまたはExcelに貼り付け、カスタム5空白でテキストを列に分割しました。