Kerasニューラルネットワークモデルを量子化する

Question

最近、Tensorflow + Kerasを使用してニューラルネットワークの作成を開始しました。Tensorflowで使用可能な量子化機能を試してみたいと思います。これまで、TFチュートリアルのサンプルを試してみたところうまくいきました。この基本的なサンプルがあります（ https://www.tensorflow.org/tutorials/keras/basic_classification から）：

import tensorflow as tf from tensorflow import keras fashion_mnist = keras.datasets.fashion_mnist (train_images, train_labels), (test_images, test_labels) = fashion_mnist.load_data() # fashion mnist data labels (indexes related to their respective labelling in the data set) class_names = ['T-shirt/top', 'Trouser', 'Pullover', 'Dress', 'Coat', 'Sandal', 'Shirt', 'Sneaker', 'Bag', 'Ankle boot'] # preprocess the train and test images train_images = train_images / 255.0 test_images = test_images / 255.0 # settings variables input_shape = (train_images.shape[1], train_images.shape[2]) # create the model layers model = keras.Sequential([ keras.layers.Flatten(input_shape=input_shape), keras.layers.Dense(128, activation=tf.nn.relu), keras.layers.Dense(10, activation=tf.nn.softmax) ]) # compile the model with added settings model.compile(optimizer=tf.train.AdamOptimizer(), loss='sparse_categorical_crossentropy', metrics=['accuracy']) # train the model epochs = 3 model.fit(train_images, train_labels, epochs=epochs) # evaluate the accuracy of model on test data test_loss, test_acc = model.evaluate(test_images, test_labels) print('Test accuracy:', test_acc)

ここで、学習および分類プロセスで量子化を使用したいと思います。量子化のドキュメント（ https://www.tensorflow.org/performance/quantization ）（2018年9月15日のCCA以降、このページは利用できなくなりました）は、次のコードを使用することを提案しています：

loss = tf.losses.get_total_loss() tf.contrib.quantize.create_training_graph(quant_delay=2000000) optimizer = tf.train.GradientDescentOptimizer(0.00001) optimizer.minimize(loss)

ただし、このコードの使用場所やTFコードへの接続方法に関する情報は含まれていません（Kerasで作成された高レベルモデルについては言及していません）。この量子化の部分が以前に作成されたニューラルネットワークモデルにどのように関係するのか、私にはわかりません。ニューラルネットワークコードの後に挿入するだけで、次のエラーが発生します。

Traceback (most recent call last): File "so.py", line 41, in <module> loss = tf.losses.get_total_loss() File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/ops/losses/util.py", line 112, in get_total_loss return math_ops.add_n(losses, name=name) File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/ops/math_ops.py", line 2119, in add_n raise ValueError("inputs must be a list of at least one Tensor with the " ValueError: inputs must be a list of at least one Tensor with the same dtype and shape

この方法でKeras NNモデルを量子化することは可能ですか、それとも基本的なものがありませんか？私の頭に浮かぶ可能性のある解決策は、Kerasの代わりに低レベルTF APIを使用すること（モデルを構築するためにかなりの作業を行う必要があります）、またはKerasモデルから低レベルのメソッドの一部を抽出しようとすることです。

Baptiste Pouthier · Answer

ネットワークは非常にシンプルに見えるため、 Tensorflow lite を使用できます。

Mitiku · Answer

Tensorflow liteを使用して、kerasモデルを量子化できます。

次のコードは、テンソルフロー1.14用に作成されました。以前のバージョンでは機能しない場合があります。

まず、モデルをトレーニングした後、モデルをh5に保存する必要があります

model.fit(train_images, train_labels, epochs=epochs) # evaluate the accuracy of model on test data test_loss, test_acc = model.evaluate(test_images, test_labels) print('Test accuracy:', test_acc) model.save("model.h5"

Kerasモデルをロードするには、 tf.lite.TFLiteConverter.from_keras_model_file を使用します

converter = tf.lite.TFLiteConverter.from_keras_model_file("model.h5") tflite_model = converter.convert() # Save the model to file with open("tflite_model.tflite", "wb") as output_file: output_file.write(tflite_model)

保存されたモデルは、pythonスクリプトまたは他のプラットフォームと言語にロードできます。保存されたtfliteモデルを使用するために、tensorlfow.liteは Interpreter を提供します。 here は、pythonスクリプトを使用してローカルファイルからtfliteモデルをロードする方法を示しています。

import numpy as np import tensorflow as tf # Load TFLite model and allocate tensors. interpreter = tf.lite.Interpreter(model_path="tflite_model.tflite") interpreter.allocate_tensors() # Get input and output tensors. input_details = interpreter.get_input_details() output_details = interpreter.get_output_details() # Test model on random input data. input_shape = input_details[0]['shape'] input_data = np.array(np.random.random_sample(input_shape), dtype=np.float32) interpreter.set_tensor(input_details[0]['index'], input_data) interpreter.invoke() # The function `get_tensor()` returns a copy of the tensor data. # Use `tensor()` in order to get a pointer to the tensor. output_data = interpreter.get_tensor(output_details[0]['index']) print(output_data)

Jianyu · Answer

他の回答で述べたように、TensorFlow Liteはネットワークの量子化に役立ちます。

TensorFlow Liteは、量子化のいくつかのレベルのサポートを提供します。

Tensorflow Liteのトレーニング後の量子化は、トレーニング後の重みとアクティベーションを簡単に量子化します。量子化対応のトレーニングにより、最小限の精度低下で量子化できるネットワークのトレーニングが可能になります。これは、畳み込みニューラルネットワークアーキテクチャのサブセットでのみ使用できます。

そのため、最初にトレーニング後の量子化または量子化対応トレーニングのどちらを必要とするかを決定する必要があります。たとえば、モデルを既に* .h5ファイルとして保存している場合は、おそらく@Mitikuの指示に従ってトレーニング後の量子化を実行する必要があります。

（質問で引用した方法を使用して）トレーニングでの量子化の効果をシミュレートすることでより高いパフォーマンスを達成したい場合、およびモデルisは量子化に対応したトレーニングでサポートされているCNNアーキテクチャのサブセット、この例は、KerasとTensorFlow間の相互作用の点で役立ちます。基本的に、モデル定義とそのフィッティングの間にこのコードを追加するだけです。

sess = tf.keras.backend.get_session() tf.contrib.quantize.create_training_graph(sess.graph) sess.run(tf.global_variables_initializer())