単純なPythonループを並列化する方法

Question

これはおそらく些細な問題ですが、Pythonで次のループを並列化するにはどうすればよいですか。

# setup output lists output1 = list() output2 = list() output3 = list() for j in range(0, 10): # calc individual parameter value parameter = j * offset # call the calculation out1, out2, out3 = calc_stuff(parameter = parameter) # put results into correct output list output1.append(out1) output2.append(out2) output3.append(out3)

Pythonでシングルスレッドを起動する方法はわかっていますが、結果を「収集」する方法はわかりません。

複数のプロセスでも問題ありません - この場合に最も簡単なものは何でも。私は現在Linuxを使用していますが、コードはWindowsとMacでも同様に動作するはずです。

このコードを並列化する最も簡単な方法は何ですか？

Sven Marnach · Accepted Answer

グローバルインタープリタロック（GIL）のため、CPythonで複数のスレッドを使用しても、純粋なPythonコードのパフォーマンスは向上しません。代わりに multiprocessing モジュールを使うことをお勧めします。

pool = multiprocessing.Pool(4) out1, out2, out3 = Zip(*pool.map(calc_stuff, range(0, 10 * offset, offset)))

これは対話型インタプリタでは機能しないことに注意してください。

GILに関する通常のFUDを回避するには：とにかく、この例でスレッドを使用することに利点はありません。あなたはスレッドではなく、ここでプロセスを使用したいのです。なぜなら、それらは問題の全体を回避するからです。

Gael Varoquaux · Answer

単純なforループを並列化するために、 joblib は、マルチプロセッシングを生で使用することに大きな価値をもたらします。短い構文だけでなく、反復が非常に高速なときの透過的な反復処理（オーバーヘッドを排除するため）や子プロセスのトレースバックのキャプチャなどにより、エラー報告が向上します。

免責事項：私はjoblibの最初の作者です。

Aaron Hall · Answer

このコードを並列化する最も簡単な方法は何ですか？

私は本当にconcurrent.futuresが好きです。Python3 バージョン3.2以降 - そして、2.6と2.7のバックポート経由で利用可能 PyPi .

スレッドまたはプロセスを使用して、まったく同じインターフェースを使用できます。

マルチプロセッシング

これをファイルに入れてください - futuretest.py：

import concurrent.futures import time, random # add some random sleep time offset = 2 # you don't supply these so def calc_stuff(parameter=None): # these are examples. sleep_time = random.choice([0, 1, 2, 3, 4, 5]) time.sleep(sleep_time) return parameter / 2, sleep_time, parameter * parameter def procedure(j): # just factoring out the parameter = j * offset # procedure # call the calculation return calc_stuff(parameter=parameter) def main(): output1 = list() output2 = list() output3 = list() start = time.time() # let's see how long this takes # we can swap out ProcessPoolExecutor for ThreadPoolExecutor with concurrent.futures.ProcessPoolExecutor() as executor: for out1, out2, out3 in executor.map(procedure, range(0, 10)): # put results into correct output list output1.append(out1) output2.append(out2) output3.append(out3) finish = time.time() # these kinds of format strings are only available on Python 3.6: # time to upgrade! print(f'original inputs: {repr(output1)}') print(f'total time to execute {sum(output2)} = sum({repr(output2)})') print(f'time saved by parallelizing: {sum(output2) - (finish-start)}') print(f'returned in order given: {repr(output3)}') if __== '__main__': main()

そして、これが出力です。

$ python3 -m futuretest original inputs: [0.0, 1.0, 2.0, 3.0, 4.0, 5.0, 6.0, 7.0, 8.0, 9.0] total time to execute 33 = sum([0, 3, 3, 4, 3, 5, 1, 5, 5, 4]) time saved by parallellizing: 27.68999981880188 returned in order given: [0, 4, 16, 36, 64, 100, 144, 196, 256, 324]

マルチスレッド

ProcessPoolExecutorをThreadPoolExecutorに変更して、モジュールをもう一度実行します。

$ python3 -m futuretest original inputs: [0.0, 1.0, 2.0, 3.0, 4.0, 5.0, 6.0, 7.0, 8.0, 9.0] total time to execute 19 = sum([0, 2, 3, 5, 2, 0, 0, 3, 3, 1]) time saved by parallellizing: 13.992000102996826 returned in order given: [0, 4, 16, 36, 64, 100, 144, 196, 256, 324]

これで、マルチスレッドとマルチプロセッシングの両方が完了しました。

パフォーマンスおよび両方を一緒に使用していることに注意してください。

サンプリングは結果を比較するには小さすぎます。

しかし、Windowsはフォークをサポートしていないため、それぞれの新しいプロセスが立ち上がるまでに時間がかかるため、マルチスレッドは一般的にマルチプロセッシングよりも高速になるでしょう。 LinuxやMacでは、おそらくもっと近いでしょう。

複数のプロセス内に複数のスレッドをネストすることはできますが、複数のプロセスを分割するために複数のスレッドを使用しないことをお勧めします。

tyrex · Answer

from joblib import Parallel, delayed import multiprocessing inputs = range(10) def processInput(i): return i * i num_cores = multiprocessing.cpu_count() results = Parallel(n_jobs=num_cores)(delayed(processInput)(i) for i in inputs) print(results)

上記は私のマシンではうまく動作します（Ubuntu、パッケージjoblibはプレインストールされていましたが、pip install joblib経由でインストールすることができます）。

から取得しました https://blog.dominodatalab.com/simple-parallelization/

Robert Nishihara · Answer

Ray を使うことにはたくさんの利点があります。

複数のコアに加えて（同じコードで）複数のマシンに並列化することができます。
共有メモリ（およびゼロコピーシリアル化）による数値データの効率的な処理.
分散スケジューリングによる高いタスクスループット.
フォールトトレランス.

あなたの場合は、Rayを起動してリモート関数を定義することができます。

import ray ray.init() @ray.remote(num_return_vals=3) def calc_stuff(parameter=None): # Do something. return 1, 2, 3

そしてそれを並行して呼び出す

output1, output2, output3 = [], [], [] # Launch the tasks. for j in range(10): id1, id2, id3 = calc_stuff.remote(parameter=j) output1.append(id1) output2.append(id2) output3.append(id3) # Block until the results have finished and get the results. output1 = ray.get(output1) output2 = ray.get(output2) output3 = ray.get(output3)

同じ例をクラスタで実行するために変更する唯一の行は、ray.init（）への呼び出しです。関連文書はこちらにあります。

私はRayの開発を手伝っていることに注意してください。

jackdoe · Answer

なぜ1つのグローバルリストを保護するためにスレッドと1つのミューテックスを使わないのですか？

import os import re import time import sys import thread from threading import Thread class thread_it(Thread): def __init__ (self,param): Thread.__init__(self) self.param = param def run(self): mutex.acquire() output.append(calc_stuff(self.param)) mutex.release() threads = [] output = [] mutex = thread.allocate_lock() for j in range(0, 10): current = thread_it(j * offset) threads.append(current) current.start() for t in threads: t.join() #here you have output list filled with data

覚えておいて、あなたはあなたの最も遅いスレッドと同じくらい速くなるでしょう

miuxu · Answer

joblibは私にとって非常に便利です。次の例を見てください。

from joblib import Parallel, delayed def yourfunction(k): s=3.14*k*k print "Area of a circle with a radius ", k, " is:", s element_run = Parallel(n_jobs=-1)(delayed(yourfunction)(k) for k in range(1,10))

n_jobs = -1：利用可能なすべてのコアを使用

Adil Warsi · Answer

並列処理の非常に単純な例は

from multiprocessing import Process output1 = list() output2 = list() output3 = list() def yourfunction(): for j in range(0, 10): # calc individual parameter value parameter = j * offset # call the calculation out1, out2, out3 = calc_stuff(parameter=parameter) # put results into correct output list output1.append(out1) output2.append(out2) output3.append(out3) if __== '__main__': p = Process(target=pa.yourfunction, args=('bob',)) p.start() p.join()

Amit Teli · Answer

非同期関数があるとしましょう

async def work_async(self, student_name: str, code: str, loop): """ Some async function """ # Do some async procesing

それは大規模な配列で実行する必要があります。いくつかの属性がプログラムに渡されており、いくつかは配列内の辞書要素のプロパティから使用されています。

async def process_students(self, student_name: str, loop): market = sys.argv[2] subjects = [...] #Some large array batchsize = 5 for i in range(0, len(subjects), batchsize): batch = subjects[i:i+batchsize] await asyncio.gather(*(self.work_async(student_name, sub['Code'], loop) for sub in batch))

TEe · Answer

これは、Pythonでマルチプロセッシングおよび並列/分散コンピューティングを実装するときに便利です。

techilaパッケージの使い方に関するYouTubeのチュートリアル

Techilaは、techilaパッケージを使用してPythonと直接統合する分散コンピューティングミドルウェアです。パッケージのPeach関数は、ループ構造を並列化するのに役立ちます。（以下のコードスニペットは Techila Community Forums からのものです）

techila.Peach(funcname = 'theheavyalgorithm', # Function that will be called on the compute nodes/ Workers files = 'theheavyalgorithm.py', # Python-file that will be sourced on Workers jobs = jobcount # Number of Jobs in the Project )

Felipe de Mac&#234;do · Answer

ありがとう@iuryxavier

from multiprocessing import Pool from multiprocessing import cpu_count def add_1(x): return x + 1 if __== "__main__": pool = Pool(cpu_count()) results = pool.map(add_1, range(10**12)) pool.close() # 'TERM' pool.join() # 'KILL'