Python 3：プールは、マップに渡されるデータの元の順序を維持しますか？

Question

4つのスレッド間でワークロードを分散し、結果が（入力の順序に関して）順序付けられたままであるかどうかをテストするための小さなスクリプトを作成しました。

from multiprocessing import Pool import numpy as np import time import random rows = 16 columns = 1000000 vals = np.arange(rows * columns, dtype=np.int32).reshape(rows, columns) def worker(arr): time.sleep(random.random()) # let the process sleep a random for idx in np.ndindex(arr.shape): # amount of time to ensure that arr[idx] += 1 # the processes finish at different # time steps return arr # create the threadpool with Pool(4) as p: # schedule one map/worker for each row in the original data q = p.map(worker, [row for row in vals]) for idx, row in enumerate(q): print("[{:0>2}]: {: >8} - {: >8}".format(idx, row[0], row[-1]))

私にとって、これは常に結果になります：

[00]: 1 - 1000000 [01]: 1000001 - 2000000 [02]: 2000001 - 3000000 [03]: 3000001 - 4000000 [04]: 4000001 - 5000000 [05]: 5000001 - 6000000 [06]: 6000001 - 7000000 [07]: 7000001 - 8000000 [08]: 8000001 - 9000000 [09]: 9000001 - 10000000 [10]: 10000001 - 11000000 [11]: 11000001 - 12000000 [12]: 12000001 - 13000000 [13]: 13000001 - 14000000 [14]: 14000001 - 15000000 [15]: 15000001 - 16000000

質問：それで、Poolは各map関数の結果をqに保存するときに元の入力の順序を本当に保持しますか？

Sidenote：複数のワーカーで作業を並列化する簡単な方法が必要なため、これを求めています。場合によっては、順序は関係ありません。ただし、順序付きデータに依存する追加のreduce関数を使用しているため、結果（qなど）を元の順序で返す必要がある場合があります。

パフォーマンス：私のマシンでは、この操作は、単一プロセスでの通常の実行よりも約4倍高速です（予想どおり、4つのコアがあるため）。さらに、4つのコアはすべて、ランタイム中に100％使用されています。

user2357112 · Answer

Pool.map結果が並べられます。注文が必要な場合は、素晴らしいです。そうしないと、 Pool.imap_unordered が便利な最適化になります。

Pool.mapから結果を受け取る順序は固定されていますが、計算される順序は任意です。

mgilson · Answer

ドキュメントでは、 "map()組み込み関数と同等の並列機能" と表記しています。 mapは順序を保持することが保証されているため、multiprocessing.Pool.mapはその保証も行います。