Tensorflow：グラデーションを置換または変更する方法は？

Question

テンソルフローのopまたはグラフの一部の勾配を置換または変更したいと思います。計算で既存の勾配を使用できれば理想的です。

ある意味では、これはtf.stop_gradient()の反対です。勾配を計算するときに無視される計算を追加する代わりに、勾配を計算するときにのみ使用される計算が必要です。

簡単な例は、勾配に定数を掛けることで単純にスケーリングするものです（ただし、順方向の計算に定数を掛けることはありません）。別の例は、グラデーションを特定の範囲にクリップするものです。

BlueSun · Accepted Answer

Tensorflow 1.7以降の場合は、編集ブローをご覧ください。

最初にカスタムグラデーションを定義します。

@tf.RegisterGradient("CustomGrad") def _const_mul_grad(unused_op, grad): return 5.0 * grad

フォワードパスで何も起こらないようにするため、新しいグラデーションで恒等操作のグラデーションをオーバーライドします。

g = tf.get_default_graph() with g.gradient_override_map({"Identity": "CustomGrad"}): output = tf.identity(input, name="Identity")

同じメソッドを使用して、後方パスでグラデーションをクリップし、前方パスでは何もしないレイヤーの例を次に示します。

import tensorflow as tf @tf.RegisterGradient("CustomClipGrad") def _clip_grad(unused_op, grad): return tf.clip_by_value(grad, -0.1, 0.1) input = tf.Variable([3.0], dtype=tf.float32) g = tf.get_default_graph() with g.gradient_override_map({"Identity": "CustomClipGrad"}): output_clip = tf.identity(input, name="Identity") grad_clip = tf.gradients(output_clip, input) # output without gradient clipping in the backwards pass for comparison: output = tf.identity(input) grad = tf.gradients(output, input) with tf.Session() as sess: sess.run(tf.global_variables_initializer()) print("with clipping:", sess.run(grad_clip)[0]) print("without clipping:", sess.run(grad)[0])

TensorFlow 1.7の編集

1.7以降、より短い構文でグラデーションを再定義する新しい方法があります。（複数の操作の勾配を同時に再定義することもできますが、この質問には必要ありません）。 TensorFlow 1.7用に書き直した上記の例を以下に示します。

バックワードパスでグラデーションをスケーリングするレイヤー：

@tf.custom_gradient def scale_grad_layer(x): def grad(dy): return 5.0 * dy return tf.identity(x), grad

バックワードパスでグラデーションをクリップするレイヤーの例：

import tensorflow as tf input = tf.Variable([3.0], dtype=tf.float32) @tf.custom_gradient def clip_grad_layer(x): def grad(dy): return tf.clip_by_value(dy, -0.1, 0.1) return tf.identity(x), grad output_clip = clip_grad_layer(input) grad_clip = tf.gradients(output_clip, input) # output without gradient clipping in the backwards pass for comparison: output = tf.identity(input) grad = tf.gradients(output, input) with tf.Session() as sess: sess.run(tf.global_variables_initializer()) print("with clipping:", sess.run(grad_clip)[0]) print("without clipping:", sess.run(grad)[0])

xxi · Answer

optimizer.compute_gradientsまたはtf.gradientを使用して、元のグラデーションを取得します
それからあなたがやりたいことをしてください
最後に、optimizer.apply_gradientsを使用します

Githubから例を見つけました

Bily · Answer

前方計算は

y = f(x)

そして、あなたはそれを次のように逆伝播させたい

y = b(x)

簡単なハックは次のとおりです。

y = b(x) + tf.stop_gradient(f(x) - b(x))

MaxB · Answer

最も一般的な方法は、 https://www.tensorflow.org/api_docs/python/tf/RegisterGradient を使用することです。

以下では、逆伝播勾配クリッピングを実装しました。これは、以下に示すように、matmulとともに使用できます。または、他のop：

import tensorflow as tf import numpy as np # from https://Gist.github.com/harpone/3453185b41d8d985356cbe5e57d67342 def py_func(func, inp, Tout, stateful=True, name=None, grad=None): # Need to generate a unique name to avoid duplicates: rnd_name = 'PyFuncGrad' + str(np.random.randint(0, 1E+8)) tf.RegisterGradient(rnd_name)(grad) g = tf.get_default_graph() with g.gradient_override_map({"PyFunc": rnd_name}): return tf.py_func(func, inp, Tout, stateful=stateful, name=name) def clip_grad(x, clip_value, name=None): """" scales backpropagated gradient so that its L2 norm is no more than `clip_value` """ with tf.name_scope(name, "ClipGrad", [x]) as name: return py_func(lambda x : x, [x], [tf.float32], name=name, grad=lambda op, g : tf.clip_by_norm(g, clip_value))[0]

使用例：

with tf.Session() as sess: x = tf.constant([[1., 2.], [3., 4.]]) y = tf.constant([[1., 2.], [3., 4.]]) print('without clipping') z = tf.matmul(x, y) print(tf.gradients(tf.reduce_sum(z), x)[0].eval()) print('with clipping') z = tf.matmul(clip_grad(x, 1.0), clip_grad(y, 0.5)) print(tf.gradients(tf.reduce_sum(z), x)[0].eval()) print('with clipping between matmuls') z = tf.matmul(clip_grad(tf.matmul(x, y), 1.0), y) print(tf.gradients(tf.reduce_sum(z), x)[0].eval())

出力：

without clipping [[ 3. 7.] [ 3. 7.]] with clipping [[ 0.278543 0.6499337] [ 0.278543 0.6499337]] with clipping between matmuls [[ 1.57841039 3.43536377] [ 1.57841039 3.43536377]]

cheersmate · Answer

現在のTensorFlow r1.13の場合、 tf.custom_gradient を使用します。

装飾された関数（入力引数はリストx）は

フォワードパスの結果、および
xの各要素に1つずつ、勾配のリストを返す関数。

1つの変数の例を次に示します。

@tf.custom_gradient def non_differentiable(x): f = tf.cast(x > 0, tf.float32) def grad(dy): return tf.math.maximum(0., 1 - tf.abs(x)) return f, grad

そして、2つの1つ：

@tf.custom_gradient def non_differentiable2(x0, x1): f = x0 * tf.cast(x1 > 0, tf.float32) def grad(dy): df_dx0 = tf.cast(x1 > 0, tf.float32) return dy*df_dx0, tf.zeros_like(dy) return f, grad