inter_op_parallelism_threadsとintra_op_parallelism_threadsの意味

Question

誰かが次のTensorFlow用語を説明してもらえますか

inter_op_parallelism_threads
intra_op_parallelism_threads

または、正しい説明のソースへのリンクを提供してください。

パラメーターを変更していくつかのテストを実施しましたが、結論に至るまでの結果は一貫していません。

mrry · Answer

inter_op_parallelism_threadsおよびintra_op_parallelism_threadsオプションは tf.ConfigProtoプロトコルバッファーのソースに記載されています。これらのオプションは、TensorFlowが使用する2つのスレッドプールを構成して、コメントが説明しているように、実行を並列化します。

// The execution of an individual op (for some op types) can be // parallelized on a pool of intra_op_parallelism_threads. // 0 means the system picks an appropriate number. int32 intra_op_parallelism_threads = 2; // Nodes that perform blocking operations are enqueued on a pool of // inter_op_parallelism_threads available in each process. // // 0 means the system picks an appropriate number. // // Note that the first Session created in the process sets the // number of threads for all future sessions unless use_per_session_threads is // true or session_inter_op_thread_pool is configured. int32 inter_op_parallelism_threads = 5;

TensorFlowグラフを実行する場合、並列処理にはいくつかの可能な形式があり、これらのオプションはマルチコアCPUの並列処理を制御します。

行列の乗算（tf.matmul()）やリダクション（例：tf.reduce_sum()）など、内部で並列化できる操作がある場合、TensorFlowはintra_op_parallelism_threadsスレッドを使用してスレッドプールのタスクをスケジュールすることで実行します。したがって、この構成オプションは、単一操作の最大並列高速化を制御します。複数の操作を並行して実行する場合、これらの操作はこのスレッドプールを共有することに注意してください。
TensorFlowグラフで独立した多くの操作がある場合（データフローグラフでは操作間に有向パスがないため）、TensorFlowはinter_op_parallelism_threadsスレッドでスレッドプールを使用して、それらを同時に実行しようとします。これらの操作にマルチスレッド実装がある場合、（ほとんどの場合）操作内並列化のために同じスレッドプールを共有します。

最後に、両方の構成オプションは0のデフォルト値を取ります。これは、「システムが適切な番号を選択する」ことを意味します。現在、これは、各スレッドプールがマシンのCPUコアごとに1つのスレッドを持つことを意味します。

mrk · Answer

マシンから最高のパフォーマンスを得るには、並列スレッドとOpenMPの設定をtensorflow backendから以下のように変更します（from here ）：

import tensorflow as tf #Assume that the number of cores per socket in the machine is denoted as NUM_PARALLEL_EXEC_UNITS # when NUM_PARALLEL_EXEC_UNITS=0 the system chooses appropriate settings config = tf.ConfigProto(intra_op_parallelism_threads=NUM_PARALLEL_EXEC_UNITS, inter_op_parallelism_threads=2, allow_soft_placement=True, device_count = {'CPU': NUM_PARALLEL_EXEC_UNITS}) session = tf.Session(config=config)