TimeDistributed（Dense）対Dense in Keras-同じ数のパラメーター

Question

リカレントレイヤー（GRU）を使用して文字列を別の文字列に変換するモデルを構築しています。 DenseレイヤーとTimeDistributed（Dense）レイヤーの両方を最後から1つのレイヤーとして試しましたが、return_sequences = Trueを使用したときの2つの違いを理解できません。。

私の簡略化されたモデルは次のとおりです：

InputSize = 15 MaxLen = 64 HiddenSize = 16 inputs = keras.layers.Input(shape=(MaxLen, InputSize)) x = keras.layers.recurrent.GRU(HiddenSize, return_sequences=True)(inputs) x = keras.layers.TimeDistributed(keras.layers.Dense(InputSize))(x) predictions = keras.layers.Activation('softmax')(x)

ネットワークの概要は次のとおりです。

_________________________________________________________________ Layer (type) Output Shape Param # ================================================================= input_1 (InputLayer) (None, 64, 15) 0 _________________________________________________________________ gru_1 (GRU) (None, 64, 16) 1536 _________________________________________________________________ time_distributed_1 (TimeDist (None, 64, 15) 255 _________________________________________________________________ activation_1 (Activation) (None, 64, 15) 0 =================================================================

TimeDistributedの理解はすべての時点で同じレイヤーを適用するため、これは私にとって意味があります。したがって、高密度レイヤーには16 * 15 + 15 = 255パラメーター（重み+バイアス）があります。

ただし、単純な高密度レイヤーに切り替えると：

inputs = keras.layers.Input(shape=(MaxLen, InputSize)) x = keras.layers.recurrent.GRU(HiddenSize, return_sequences=True)(inputs) x = keras.layers.Dense(InputSize)(x) predictions = keras.layers.Activation('softmax')(x)

私はまだ255個のパラメータしか持っていません：

_________________________________________________________________ Layer (type) Output Shape Param # ================================================================= input_1 (InputLayer) (None, 64, 15) 0 _________________________________________________________________ gru_1 (GRU) (None, 64, 16) 1536 _________________________________________________________________ dense_1 (Dense) (None, 64, 15) 255 _________________________________________________________________ activation_1 (Activation) (None, 64, 15) 0 =================================================================

これは、Dense（）が形状の最後の次元のみを使用し、それ以外のすべてをバッチのような次元として効果的に処理するためでしょうか。ただし、DenseとTimeDistributed（Dense）の違いがわかりません。

更新 https://github.com/fchollet/keras/blob/master/keras/layers/core.py を見て= Denseは最後の次元をサイズ自体にのみ使用しているようです：

def build(self, input_shape): assert len(input_shape) >= 2 input_dim = input_shape[-1] self.kernel = self.add_weight(shape=(input_dim, self.units),

また、keras.dotを使用して重みを適用します。

def call(self, inputs): output = K.dot(inputs, self.kernel)

Keras.dotのドキュメントは、それがn次元のテンソルで正常に機能することを示唆しています。その正確な動作は、Dense（）が事実上すべてのタイムステップで呼び出されることを意味するのでしょうか。もしそうなら、この場合、TimeDistributed（）が何を達成するかという問題が残っています。

mujjiga · Accepted Answer

TimeDistributedDenseは、GRU/LSTMセルの展開中にすべてのタイムステップに同じデンスを適用します。したがって、誤差関数は予測されたラベルシーケンスと実際のラベルシーケンスの間にあります。（これは通常、シーケンス間のラベル付けの問題の要件です）。

ただし、return_sequences = Falseの場合、高密度レイヤーは最後のセルに1回だけ適用されます。これは通常、分類問題にRNNが使用される場合です。 return_sequences = Trueの場合、TimeDistributedDenseと同様に、高密度レイヤーがすべてのタイムステップに適用されます。

したがって、モデルごとに両方とも同じですが、2番目のモデルを "return_sequences = False"に変更すると、密は最後のセルにのみ適用されます。 Yのサイズは[Batch_size、InputSize]になるため、変更してみてください。モデルはエラーとしてスローされます。これは、シーケンスからシーケンスではなく、ラベルからラベルへの完全なシーケンスの問題です。

from keras.models import Sequential from keras.layers import Dense, Activation, TimeDistributed from keras.layers.recurrent import GRU import numpy as np InputSize = 15 MaxLen = 64 HiddenSize = 16 OutputSize = 8 n_samples = 1000 model1 = Sequential() model1.add(GRU(HiddenSize, return_sequences=True, input_shape=(MaxLen, InputSize))) model1.add(TimeDistributed(Dense(OutputSize))) model1.add(Activation('softmax')) model1.compile(loss='categorical_crossentropy', optimizer='rmsprop') model2 = Sequential() model2.add(GRU(HiddenSize, return_sequences=True, input_shape=(MaxLen, InputSize))) model2.add(Dense(OutputSize)) model2.add(Activation('softmax')) model2.compile(loss='categorical_crossentropy', optimizer='rmsprop') model3 = Sequential() model3.add(GRU(HiddenSize, return_sequences=False, input_shape=(MaxLen, InputSize))) model3.add(Dense(OutputSize)) model3.add(Activation('softmax')) model3.compile(loss='categorical_crossentropy', optimizer='rmsprop') X = np.random.random([n_samples,MaxLen,InputSize]) Y1 = np.random.random([n_samples,MaxLen,OutputSize]) Y2 = np.random.random([n_samples, OutputSize]) model1.fit(X, Y1, batch_size=128, nb_Epoch=1) model2.fit(X, Y1, batch_size=128, nb_Epoch=1) model3.fit(X, Y2, batch_size=128, nb_Epoch=1) print(model1.summary()) print(model2.summary()) print(model3.summary())

上記の例では、model1とmodel2のアーキテクチャはサンプル（シーケンスモデルのシーケンス）であり、model3はモデルにラベルを付ける完全なシーケンスです。

user263387 · Answer

次に、TimeDistirbuted(Dense(X))がDense(X)と同一であることを確認するコードを示します。

import numpy as np from keras.layers import Dense, TimeDistributed import tensorflow as tf X = np.array([ [[1, 2, 3], [4, 5, 6], [7, 8, 9], [10, 11, 12] ], [[3, 1, 7], [8, 2, 5], [11, 10, 4], [9, 6, 12] ] ]).astype(np.float32) print(X.shape)

（（2、4、3）

dense_weights = np.array([[0.1, 0.2, 0.3, 0.4, 0.5], [0.2, 0.7, 0.9, 0.1, 0.2], [0.1, 0.8, 0.6, 0.2, 0.4]]) bias = np.array([0.1, 0.3, 0.7, 0.8, 0.4]) print(dense_weights.shape)

（3、5）

dense = Dense(input_dim=3, units=5, weights=[dense_weights, bias]) input_tensor = tf.Variable(X, name='inputX') output_tensor1 = dense(input_tensor) output_tensor2 = TimeDistributed(dense)(input_tensor) print(output_tensor1.shape) print(output_tensor2.shape)

（2、4、5）

（2、？、5）

with tf.Session() as sess: sess.run(tf.global_variables_initializer()) output1 = sess.run(output_tensor1) output2 = sess.run(output_tensor2) print(output1 - output2)

そして、違いは：

[[[0. 0. 0. 0. 0.] [0. 0. 0. 0. 0.] [0. 0. 0. 0. 0.] [0. 0. 0. 0. 0.]] [[0. 0. 0. 0. 0.] [0. 0. 0. 0. 0.] [0. 0. 0. 0. 0.] [0. 0. 0. 0. 0.]]]