データフレームでカスタムビンを定義および適用する

Question

pythonを使用して、類似値を含む次のデータフレームを作成しました。

 cosinFcolor cosinEdge cosinTexture histoFcolor histoEdge histoTexture jaccard 1 0.770 0.489 0.388 0.57500000 0.5845137 0.3920000 0.00000000 2 0.067 0.496 0.912 0.13865546 0.6147309 0.6984127 0.00000000 3 0.514 0.426 0.692 0.36440678 0.4787535 0.5198413 0.05882353 4 0.102 0.430 0.739 0.11297071 0.5288008 0.5436508 0.00000000 5 0.560 0.735 0.554 0.48148148 0.8168083 0.4603175 0.00000000 6 0.029 0.302 0.558 0.08547009 0.3928234 0.4603175 0.00000000

ビンを反映する別のデータフレームを生成するRスクリプトを作成しようとしていますが、値が0.5を超える場合にビニングの条件が適用されます。

擬似コード：

if (cosinFcolor > 0.5 & cosinFcolor <= 0.6) bin = 1 if (cosinFcolor > 0.6 & cosinFcolor <= 0.7) bin = 2 if (cosinFcolor > 0.7 & cosinFcolor =< 0.8) bin = 3 if (cosinFcolor > 0.8 & cosinFcolor <=0.9) bin = 4 if (cosinFcolor > 0.9 & cosinFcolor <= 1.0) bin = 5 else bin = 0

上記のロジックに基づいて、データフレームを構築したい

 cosinFcolor cosinEdge cosinTexture histoFcolor histoEdge histoTexture jaccard 1 3 0 0 1 1 0 0

スクリプトとしてこれを開始するにはどうすればよいですか、Pythonでこれを行う必要がありますか？ Rの強さ/ Rが持つ機械学習パッケージの数を知り、Rに慣れようとしています。私の目標は分類器を作成することですが、最初にRに精通する必要があります:)

mnel · Answer

findIntervalを使用することもできます：

findInterval(seq(0, 1, l=20), seq(0.5, 1, by=0.1)) ## [1] 0 0 0 0 0 0 0 0 0 1 1 2 2 3 4 4 5 5

Luciano Selzer · Answer

カットするとパイのように簡単です

dtf <- read.table( textConnection( "cosinFcolor cosinEdge cosinTexture histoFcolor histoEdge histoTexture jaccard 1 0.770 0.489 0.388 0.57500000 0.5845137 0.3920000 0.00000000 2 0.067 0.496 0.912 0.13865546 0.6147309 0.6984127 0.00000000 3 0.514 0.426 0.692 0.36440678 0.4787535 0.5198413 0.05882353 4 0.102 0.430 0.739 0.11297071 0.5288008 0.5436508 0.00000000 5 0.560 0.735 0.554 0.48148148 0.8168083 0.4603175 0.00000000 6 0.029 0.302 0.558 0.08547009 0.3928234 0.4603175 0.00000000"), sep = " ", header = TRUE) dtf$bin <- cut(dtf$cosinFcolor, breaks = c(0, seq(0.5, 1, by = .1)), labels = 0:5) dtf cosinFcolor cosinEdge cosinTexture histoFcolor histoEdge histoTexture jaccard bin 1 0.770 0.489 0.388 0.57500000 0.5845137 0.3920000 0.00000000 3 2 0.067 0.496 0.912 0.13865546 0.6147309 0.6984127 0.00000000 0 3 0.514 0.426 0.692 0.36440678 0.4787535 0.5198413 0.05882353 1 4 0.102 0.430 0.739 0.11297071 0.5288008 0.5436508 0.00000000 0 5 0.560 0.735 0.554 0.48148148 0.8168083 0.4603175 0.00000000 1 6 0.029 0.302 0.558 0.08547009 0.3928234 0.4603175 0.00000000 0

Ben · Answer

mltools パッケージのbin_data()関数を使用した別のソリューションがあります。

1つのベクトルのビニング

library(mltools) cosinFcolor <- c(0.77, 0.067, 0.514, 0.102, 0.56, 0.029) binned <- bin_data(cosinFcolor, bins=c(0, 0.5, 0.6, 0.7, 0.8, 0.9, 1.0), boundaryType = "[lorc") binned [1] (0.7, 0.8] [0, 0.5] (0.5, 0.6] [0, 0.5] (0.5, 0.6] [0, 0.5] Levels: [0, 0.5] < (0.5, 0.6] < (0.6, 0.7] < (0.7, 0.8] < (0.8, 0.9] < (0.9, 1] # Convert to numbers 0, 1, ... as.integer(binned) - 1L

Data.frameの各列をビニングする

df <- read.table(textConnection( "cosinFcolor cosinEdge cosinTexture histoFcolor histoEdge histoTexture jaccard 0.770 0.489 0.388 0.57500000 0.5845137 0.3920000 0.00000000 0.067 0.496 0.912 0.13865546 0.6147309 0.6984127 0.00000000 0.514 0.426 0.692 0.36440678 0.4787535 0.5198413 0.05882353 0.102 0.430 0.739 0.11297071 0.5288008 0.5436508 0.00000000 0.560 0.735 0.554 0.48148148 0.8168083 0.4603175 0.00000000 0.029 0.302 0.558 0.08547009 0.3928234 0.4603175 0.00000000" ), sep = " ", header = TRUE) for(col in colnames(df)) df[[col]] <- as.integer(bin_data(df[[col]], bins=c(0, 0.5, 0.6, 0.7, 0.8, 0.9, 1.0), boundaryType = "[lorc")) - 1L df cosinFcolor cosinEdge cosinTexture histoFcolor histoEdge histoTexture jaccard 1 3 0 0 1 1 0 0 2 0 0 5 0 2 2 0 3 1 0 2 0 0 1 0 4 0 0 3 0 1 1 0 5 1 3 1 0 4 0 0 6 0 0 1 0 0 0 0