サンプルキューからTensorFlowバッチにデータを読み込む方法は？

Question

TensorFlowのサンプルキューをトレーニング用の適切なバッチに入れるにはどうすればよいですか？

私はいくつかの画像とラベルを持っています：

_IMG_6642.JPG 1 IMG_6643.JPG 2 _

（別のラベル形式を提案すること自由に感じなさい;私は別の密なまばらなステップを必要とするかもしれないと思う…）

私はかなりの数のチュートリアルを読みましたが、まだすべてが揃っているわけではありません。 TensorFlowのデータの読み取りページから必要な手順を示すコメントを付けて、私が持っているものを紹介します。

ファイル名のリスト（簡単にするためにオプションの手順は削除されています）
ファイル名キュー
ファイル形式のリーダー
リーダーによって読み取られたレコードのデコーダー
キューの例

そして、キューの例の後、トレーニングのためにこのキューをバッチに入れる必要があります。それは私が立ち往生しているところです...

1。ファイル名のリスト

files = tf.train.match_filenames_once('*.JPG')

4。ファイル名キュー

filename_queue = tf.train.string_input_producer(files, num_epochs=None, shuffle=True, seed=None, shared_name=None, name=None)

5。リーダー

reader = tf.TextLineReader() key, value = reader.read(filename_queue)

6。デコーダー

record_defaults = [[""], [1]] col1, col2 = tf.decode_csv(value, record_defaults=record_defaults)（テンソルに既にラベルがあるので、このステップは必要ないと思いますが、とにかくそれを含めます）

features = tf.pack([col2])

ドキュメントページには、画像とラベルをバッチに取得するのではなく、1つの画像を実行する例があります。

for i in range(1200): # Retrieve a single instance: example, label = sess.run([features, col5])

そして、その下にバッチセクションがあります：

_def read_my_file_format(filename_queue): reader = tf.SomeReader() key, record_string = reader.read(filename_queue) example, label = tf.some_decoder(record_string) processed_example = some_processing(example) return processed_example, label def input_pipeline(filenames, batch_size, num_epochs=None): filename_queue = tf.train.string_input_producer( filenames, num_epochs=num_epochs, shuffle=True) example, label = read_my_file_format(filename_queue) # min_after_dequeue defines how big a buffer we will randomly sample # from -- bigger means better shuffling but slower start up and more # memory used. # capacity must be larger than min_after_dequeue and the amount larger # determines the maximum we will prefetch. Recommendation: # min_after_dequeue + (num_threads + a small safety margin) * batch_size min_after_dequeue = 10000 capacity = min_after_dequeue + 3 * batch_size example_batch, label_batch = tf.train.shuffle_batch( [example, label], batch_size=batch_size, capacity=capacity, min_after_dequeue=min_after_dequeue) return example_batch, label_batch _

私の質問は：上記のコードを上記のコードでどのように使用しますか？必要なバッチを使用します。ほとんどのチュートリアルには、簡単なバッチが付属しています。

_with tf.Session() as sess: sess.run(init) # Training cycle for Epoch in range(training_epochs): total_batch = int(mnist.train.num_examples/batch_size) # Loop over all batches for i in range(total_batch): batch_xs, batch_ys = mnist.train.next_batch(batch_size) _

user5869947 · Answer

この入力パイプラインを機能させるには、サンプルのバッチを生成する非同期キューイングメカニズムを追加する必要があります。これは、tf.RandomShuffleQueueまたはtf.FIFOQueueを作成し、読み取られ、デコードされ、前処理されたJPEGイメージを挿入することにより実行されます。

tf.train.shuffle_batch_joinまたはtf.train.batch_joinを介してキューを実行するためのキューと対応するスレッドを生成する便利な構造を使用できます。これがどのようなものかを簡単に示した例です。このコードはテストされていないことに注意してください。

# Let's assume there is a Queue that maintains a list of all filenames # called 'filename_queue' _, file_buffer = reader.read(filename_queue) # Decode the JPEG images images = [] image = decode_jpeg(file_buffer) # Generate batches of images of this size. batch_size = 32 # Depends on the number of files and the training speed. min_queue_examples = batch_size * 100 images_batch = tf.train.shuffle_batch_join( image, batch_size=batch_size, capacity=min_queue_examples + 3 * batch_size, min_after_dequeue=min_queue_examples) # Run your network on this batch of images. predictions = my_inference(images_batch)

ジョブをどのようにスケールアップする必要があるかに応じて、イメージを読み取り/デコード/前処理し、サンプルキューにダンプする複数の独立したスレッドを実行する必要があります。このようなパイプラインの完全な例は、Inception/ImageNetモデルで提供されています。 batch_inputsを見てください：

https://github.com/tensorflow/models/blob/master/inception/inception/image_processing.py#L407

最後に、> O（1000）JPEG画像で作業している場合、1000個の小さなファイルを個別に準備することは非常に非効率的であることに注意してください。これにより、トレーニングがかなり遅くなります。

画像のデータセットを分割されたTFRecord of Exampleプロトタイプに変換する、より堅牢で高速なソリューション。 ImageNetデータセットをそのような形式に変換するための完全に機能する script を次に示します。そして、JPEG画像を含む任意のディレクトリでこの前処理スクリプトの汎用バージョンを実行するための instructions のセットがあります。