トレーニングと検証の損失がエポック1から逸脱する場合、それはどういう意味ですか？

Question

私は最近、Kerasでディープラーニングモデルに取り組んでいましたが、非常に複雑な結果が得られました。モデルは時間の経過とともにトレーニングデータを習得することができますが、検証データで一貫して悪い結果が得られます。

検証精度がしばらく上がってから低下し始めた場合、トレーニングデータに過剰適合していることはわかっていますが、この場合、検証精度は低下するだけです。なぜこれが起こるのか私は本当に混乱しています。何がこれを引き起こす可能性があるのかについて誰かが直感を持っていますか？または、潜在的にそれを修正するためにテストするものに関する提案はありますか？

編集して情報とコードを追加します

OK。だから私はいくつかの基本的な在庫予測をしようとしているモデルを作っています。モデルは、過去40日間の始値、高値、安値、終値、出来高を調べることにより、価格が1つの平均真の範囲を下げることなく2つの平均真の範囲を上げるかどうかを予測しようとします。入力として、ダウジョーンズ工業株30種平均のすべての株式について過去30年間のこの情報を含むCSVをYahooFinanceから取得しました。モデルは、株式の70％でトレーニングを行い、残りの20％で検証します。これにより、約150,000のトレーニングサンプルが得られます。現在、1d畳み込みニューラルネットワークを使用していますが、他の小さなモデル（ロジスティック回帰と小さなフィードフォワードNN）も試しましたが、モデルが単純すぎるため、発散するトレインと検証損失のどちらかが常に同じになるか、まったく学習されません。。

コードは次のとおりです。

import numpy as np from sklearn import preprocessing from sklearn.metrics import auc, roc_curve, roc_auc_score from keras.layers import Input, Dense, Flatten, Conv1D, Activation, MaxPooling1D, Dropout, Concatenate from keras.models import Model from keras.callbacks import ModelCheckpoint, EarlyStopping, Callback from keras import backend as K import matplotlib.pyplot as plt from random import seed, shuffle from os import listdir class roc_auc(Callback): def on_train_begin(self, logs={}): self.aucs = [] def on_train_end(self, logs={}): return def on_Epoch_begin(self, Epoch, logs={}): return def on_Epoch_end(self, Epoch, logs={}): y_pred = self.model.predict(self.validation_data[0]) self.aucs.append(roc_auc_score(self.validation_data[1], y_pred)) if max(self.aucs) == self.aucs[-1]: model.save_weights("weights.roc_auc.hdf5") print(" - auc: %0.4f" % self.aucs[-1]) return def on_batch_begin(self, batch, logs={}): return def on_batch_end(self, batch, logs={}): return rrr = 2 epochs = 200 batch_size = 64 days_input = 40 seed(42) X_train = [] X_test = [] y_train = [] y_test = [] files = listdir("Stocks") total_stocks = len(files) shuffle(files) for x, file in enumerate(files): test = False if (x+1.0)/total_stocks > 0.7: test = True if test: print("Test -> Stocks/%s" % file) else: print("Train -> Stocks/%s" % file) stock = np.loadtxt(open("Stocks/"+file, "r"), delimiter=",", skiprows=1, usecols = (1,2,3,5,6)) atr = [] last = None for day in stock: if last is None: tr = abs(day[1] - day[2]) atr.append(tr) else: tr = max(day[1] - day[2], abs(last[3] - day[1]), abs(last[3] - day[2])) atr.append((13*atr[-1]+tr)/14) last = day.copy() stock = np.insert(stock, 5, atr, axis=1) for i in range(days_input,stock.shape[0]-1): input = stock[i-days_input:i, 0:5].copy() for j, day in enumerate(input): input[j][1] = (day[1]-day[0])/day[0] input[j][2] = (day[2]-day[0])/day[0] input[j][3] = (day[3]-day[0])/day[0] input[:,0] = input[:,0] / np.linalg.norm(input[:,0]) input[:,1] = input[:,1] / np.linalg.norm(input[:,1]) input[:,2] = input[:,2] / np.linalg.norm(input[:,2]) input[:,3] = input[:,3] / np.linalg.norm(input[:,3]) input[:,4] = input[:,4] / np.linalg.norm(input[:,4]) preprocessing.scale(input, copy=False) output = -1 buy = stock[i][1] stoploss = buy - stock[i][5] target = buy + rrr*stock[i][5] for j in range(i+1, stock.shape[0]): if stock[j][0] < stoploss or stock[j][2] < stoploss: output = 0 break Elif stock[j][1] > target: output = 1 break if output != -1: if test: X_test.append(input) y_test.append(output) else: X_train.append(input) y_train.append(output) shape = list(X_train[0].shape) shape[:0] = [len(X_train)] X_train = np.concatenate(X_train).reshape(shape) y_train = np.array(y_train) shape = list(X_test[0].shape) shape[:0] = [len(X_test)] X_test = np.concatenate(X_test).reshape(shape) y_test = np.array(y_test) print("Train class split is %0.2f" % (100*np.average(y_train))) print("Test class split is %0.2f" % (100*np.average(y_test))) inputs = Input(shape=(days_input,5)) x = Conv1D(32, 5, padding='same')(inputs) x = Activation('relu')(x) x = MaxPooling1D()(x) x = Conv1D(64, 5, padding='same')(x) x = Activation('relu')(x) x = MaxPooling1D()(x) x = Conv1D(128, 5, padding='same')(x) x = Activation('relu')(x) x = MaxPooling1D()(x) x = Flatten()(x) x = Dense(128, activation="relu")(x) x = Dense(64, activation="relu")(x) output = Dense(1, activation="sigmoid")(x) model = Model(inputs=inputs,outputs=output) model.compile(optimizer='adam', loss='binary_crossentropy', metrics=['accuracy']) filepath="weights.best.hdf5" checkpoint = ModelCheckpoint(filepath, monitor='val_acc', verbose=0, save_best_only=True, mode='max') auc_hist = roc_auc() callbacks_list = [checkpoint, auc_hist] history = model.fit(X_train, y_train, validation_data=(X_test,y_test) , epochs=epochs, callbacks=callbacks_list, batch_size=batch_size, class_weight ='balanced').history model_json = model.to_json() with open("model.json", "w") as json_file: json_file.write(model_json) model.save_weights("weights.latest.hdf5") model.load_weights("weights.roc_auc.hdf5") plt.plot(history['acc']) plt.plot(history['val_acc']) plt.title('model accuracy') plt.ylabel('accuracy') plt.xlabel('Epoch') plt.legend(['train', 'test'], loc='upper left') plt.show() plt.plot(history['loss']) plt.plot(history['val_loss']) plt.title('model loss') plt.ylabel('loss') plt.xlabel('Epoch') plt.legend(['train', 'test'], loc='upper left') plt.show() plt.plot(auc_hist.aucs) plt.title('model ROC AUC') plt.ylabel('AUC') plt.xlabel('Epoch') plt.show() y_pred = model.predict(X_train) fpr, tpr, _ = roc_curve(y_train, y_pred) roc_auc = auc(fpr, tpr) plt.subplot(1, 2, 1) plt.plot(fpr, tpr, label='ROC curve (area = %0.2f)' % roc_auc) plt.plot([0, 1], [0, 1], color='navy',linestyle='--') plt.xlim([0.0, 1.0]) plt.ylim([0.0, 1.0]) plt.xlabel('False Positive Rate') plt.ylabel('True Positive Rate') plt.title('Train ROC') plt.legend(loc="lower right") y_pred = model.predict(X_test) fpr, tpr, thresholds = roc_curve(y_test, y_pred) roc_auc = auc(fpr, tpr) plt.subplot(1, 2, 2) plt.plot(fpr, tpr, label='ROC curve (area = %0.2f)' % roc_auc) plt.plot([0, 1], [0, 1], color='navy',linestyle='--') plt.xlim([0.0, 1.0]) plt.ylim([0.0, 1.0]) plt.xlabel('False Positive Rate') plt.ylabel('True Positive Rate') plt.title('Test ROC') plt.legend(loc="lower right") plt.show() with open('roc.csv','w+') as file: for i in range(len(thresholds)): file.write("%f,%f,%f
" % (fpr[i], tpr[i], thresholds[i]))

エポックではなく100バッチによる結果

私は提案に耳を傾け、いくつかの更新を行いました。クラスのバランスが25％から75％ではなく50％から50％になりました。また、検証データは、特定の株式セットではなく、ランダムに選択されるようになりました。より細かい解像度（100バッチ対1エポック）で損失と精度をグラフ化することにより、過剰適合を明確に確認できます。モデルは、実際には、発散し始める前の最初から学習を開始します。オーバーフィットがすぐに始まることに驚いていますが、問題が確認できたので、デバッグできるといいのですが。

Martin Thoma · Accepted Answer

考えられる説明

コーディングエラー
トレーニング/検証データの違いによる過剰適合
歪んだクラス（およびトレーニング/検証データの違い）

私がしようとすること

トレーニングと検証セットを交換します。それでも問題は発生しますか？
最初の約10エポックの曲線をより詳細にプロットします（たとえば、初期化の直後、エポックごとだけでなく、数回のトレーニング反復ごと）。あなたはまだ> 75％から始めますか？次に、クラスが歪んでいる可能性があり、トレーニングと検証の分割が階層化されているかどうかを確認することもできます。

コード

これは役に立たない：np.concatenate(X_train)
ここに投稿するときは、コードをできるだけ読みやすくしてください。これには、コメントアウトされた行の削除が含まれます。

これは私にはコーディングエラーの疑いがあるように見えます：

if test: X_test.append(input) y_test.append(output) else: #if((output == 0 and np.average(y_train) > 0.5) or output == 1): X_train.append(input) y_train.append(output)

使用する - sklearn.model_selection.train_test_split 代わりに。前にデータに対してすべての変換を行ってから、この方法で分割します。