ROC曲線からしきい値を取得する

Question

予測されたクラスの割合のベクトルでROCRパッケージを使用するいくつかのモデルがあり、パフォーマンスオブジェクトがあります。仕様「tpr」、「fpr」でパフォーマンスオブジェクトをプロットすると、ROC曲線が得られます。

偽陽性率（x）の特定のしきい値でモデルを比較しています。パフォーマンスオブジェクトから真の正のレート（y）の値を取得することを望んでいます。さらに、そのポイントを生成するために使用されたクラスのパーセンテージのしきい値を取得したいと思います。

偽陽性率のインデックス番号（x-value）しきい値を超えずにしきい値に最も近い場合、適切な真陽性率のインデックス番号（y-value）。そのインデックス値を取得する方法が正確にはわかりません。

さらに重要なことに、そのポイントを作成するために使用されたクラス確率のしきい値を取得するにはどうすればよいですか？

Zach · Accepted Answer

これがstrが私のお気に入りのR関数である理由です：

library(ROCR) data(ROCR.simple) pred <- prediction( ROCR.simple$predictions, ROCR.simple$labels) perf <- performance(pred,"tpr","fpr") plot(perf) > str(perf) Formal class 'performance' [package "ROCR"] with 6 slots ..@ x.name : chr "False positive rate" ..@ y.name : chr "True positive rate" ..@ alpha.name : chr "Cutoff" ..@ x.values :List of 1 .. ..$ : num [1:201] 0 0 0 0 0.00935 ... ..@ y.values :List of 1 .. ..$ : num [1:201] 0 0.0108 0.0215 0.0323 0.0323 ... ..@ alpha.values:List of 1 .. ..$ : num [1:201] Inf 0.991 0.985 0.985 0.983 ...

ああああ！これは S4クラスなので、@を使用してスロットにアクセスできます。 data.frameの作成方法は次のとおりです。

cutoffs <- data.frame(cut=perf@alpha.values[[1]], fpr=perf@x.values[[1]], tpr=perf@y.values[[1]]) > head(cutoffs) cut fpr tpr 1 Inf 0.000000000 0.00000000 2 0.9910964 0.000000000 0.01075269 3 0.9846673 0.000000000 0.02150538 4 0.9845992 0.000000000 0.03225806 5 0.9834944 0.009345794 0.03225806 6 0.9706413 0.009345794 0.04301075

ヒットするfprしきい値がある場合、このdata.frameをサブセット化して、このfprしきい値未満の最大tprを見つけることができます。

cutoffs <- cutoffs[order(cutoffs$tpr, decreasing=TRUE),] > head(subset(cutoffs, fpr < 0.2)) cut fpr tpr 96 0.5014893 0.1495327 0.8494624 97 0.4997881 0.1588785 0.8494624 98 0.4965132 0.1682243 0.8494624 99 0.4925969 0.1775701 0.8494624 100 0.4917356 0.1869159 0.8494624 101 0.4901199 0.1962617 0.8494624

Enrique P&#233;rez Herrero · Answer

パッケージpROCには、最適なしきい値を計算するための関数coordsが含まれています。

library(pROC) my_roc <- roc(my_response, my_predictor) coords(my_roc, "best", ret = "threshold")

Artem Klevtsov · Answer

ROCRおよびpROCパッケージに基づく2つのソリューション：

threshold1 <- function(predict, response) { perf <- ROCR::performance(ROCR::prediction(predict, response), "sens", "spec") df <- data.frame(cut = perf@alpha.values[[1]], sens = perf@x.values[[1]], spec = perf@y.values[[1]]) df[which.max(df$sens + df$spec), "cut"] } threshold2 <- function(predict, response) { r <- pROC::roc(response, predict) r$thresholds[which.max(r$sensitivities + r$specificities)] } data(ROCR.simple, package = "ROCR") threshold1(ROCR.simple$predictions, ROCR.simple$labels) #> [1] 0.5014893 threshold2(ROCR.simple$predictions, ROCR.simple$labels) #> [1] 0.5006387

最適なしきい値を見つけるための多くのアルゴリズムを提供するOptimalCutpointsパッケージも参照してください。