ケラスの完全な勾配降下

Question

私はケラスに完全な勾配降下法を実装しようとしています。つまり、各エポックについて、データセット全体でトレーニングを行っています。これが、バッチサイズがトレーニングセットの長さのサイズとして定義されている理由です。

from keras.models import Sequential from keras.layers import Dense from keras.optimizers import SGD,Adam from keras import regularizers import numpy as np import matplotlib.pyplot as plt %matplotlib inline import random from numpy.random import seed import random def xrange(start_point,end_point,N,base): temp = np.logspace(0.1, 1, N,base=base,endpoint=False) temp=temp-temp.min() temp=(0.0+temp)/(0.0+temp.max()) #this is between 0 and 1 return (end_point-start_point)*temp +start_point #this is the range def train_model(x_train,y_train,x_test): #seed(1) model=Sequential() num_units=100 act='relu' model.add(Dense(num_units,input_shape=(1,),activation=act)) model.add(Dense(num_units,activation=act)) model.add(Dense(num_units,activation=act)) model.add(Dense(num_units,activation=act)) model.add(Dense(1,activation='tanh')) #output layer 1 unit ; activation='tanh' model.compile(Adam(),'mean_squared_error',metrics=['mse']) history=model.fit(x_train,y_train,batch_size=len(x_train),epochs=500,verbose=0,validation_split = 0.2 ) #train on the noise (not moshe) fit=model.predict(x_test) loss = history.history['loss'] val_loss = history.history['val_loss'] return fit N = 1024 start_point=-5.25 end_point=5.25 base=500# the base of the log of the trainning train_step=0.0007 x_test=np.arange(start_point,end_point,train_step+0.05) x_train=xrange(start_point,end_point,N,base) #random.shuffle(x_train) function_y=np.sin(3*x_train)/2 noise=np.random.uniform(-0.2,0.2,len(function_y)) y_train=function_y+noise fit=train_model(x_train,y_train,x_test) plt.scatter(x_train,y_train, facecolors='none', edgecolors='g') #plt.plot(x_value,sample,'bo') plt.scatter(x_test, fit, facecolors='none', edgecolors='b') #plt.plot(x_value,sample,'bo')

ただし、＃random.shuffle（x_train）のコメントを外すと、トレーニングをシャッフルします。：

なぜ別のプロットが表示されるのかわかりません（緑の円はトレーニングで、青は現代の学習です）。どちらの場合でも、バッチはすべてのデータセットのものです。だから、シャッフルは何も変更しないでください。
ありがとうございました。

アリエル

rvinas · Accepted Answer

これは2つの理由で発生します。

まず、データがシャッフルされていない場合、トレーニング/検証の分割は不適切です。
次に、full勾配降下はエポックごとに1回の更新を実行するため、収束するためにより多くのトレーニングエポックmightが必要です。

モデルが波と一致しないのはなぜですか？

から model.fit ：

validation_split：0と1の間の浮動小数点数。検証データとして使用されるトレーニングデータの割合。モデルは、トレーニングデータのこの部分を区別し、それでトレーニングせず、各エポックの終わりにこのデータの損失とモデルメトリックを評価します。 検証データは、シャッフル前に提供されたxおよびyデータの最後のサンプルから選択されます。

つまり、検証セットは最後の20％のトレーニングサンプルで構成されています。独立変数（x_train）、あなたのトレイン/検証の分割は次のとおりです：

split_point = int(0.2*N) x_val = x_train[-split_point:] y_val = y_train[-split_point:] x_train_ = x_train[:-split_point] y_train_ = y_train[:-split_point] plt.scatter(x_train_, y_train_, c='g') plt.scatter(x_val, y_val, c='r') plt.show()

前のプロットでは、トレーニングデータと検証データはそれぞれ緑と赤の点で表されています。トレーニングデータセットは、母集団全体を表すではないことに注意してください。

それでもトレーニングデータセットと一致しないのはなぜですか？

不適切なtrain/test分割に加えて、full勾配降下mightは収束するためにより多くのトレーニングエポックを必要とします（勾配はノイズが少ないです）、ただしエポックごとに1回のグラデーション更新のみを実行します）。代わりに、モデルを〜1500エポックにトレーニングする場合（またはバッチサイズが32のミニバッチグラディエントディセントを使用する場合）、最終的には次のようになります。