Pythonのマルチプロセッシングモジュールを使用して、SEAWAT / MODFLOWモデルの実行を同時に個別に実行する

Question

8プロセッサの64ビットWindows7マシンで100回のモデル実行を完了しようとしています。モデルの7つのインスタンスを同時に実行して、合計実行時間を短縮したいと思います（モデルの実行ごとに約9.5分）。 Pythonのマルチプロセッシングモジュールに関連するいくつかのスレッドを見てきましたが、まだ何かが足りません。

マルチプロセッシングモジュールの使用

マルチプロセッサシステムで並列子プロセスを生成する方法は？

Pythonマルチプロセッシングキュー

私のプロセス：

結果を比較するためにSEAWAT/MODFLOWを実行したい100の異なるパラメーターセットがあります。モデルの実行ごとにモデル入力ファイルを事前に作成し、それらを独自のディレクトリに保存しました。私がやりたいのは、すべての実現が完了するまで、一度に7つのモデルを実行することです。プロセス間の通信や結果の表示は必要ありません。これまでのところ、モデルを順番に生成することしかできませんでした。

_import os,subprocess import multiprocessing as mp ws = r'D:\Data\Users\jbellino\Project\stJohnsDeepening\model\xsec_a' files = [] for f in os.listdir(ws + r'\fieldgen
eals'): if f.endswith('.npy'): files.append(f) ## def work(cmd): ## return subprocess.call(cmd, Shell=False) def run(f,def_param=ws): real = f.split('_')[2].split('.')[0] print 'Realization %s' % real mf2k = r'c:\modflow\mf2k.1_19\bin\mf2k.exe ' mf2k5 = r'c:\modflow\MF2005_1_8\bin\mf2005.exe ' seawatV4 = r'c:\modflow\swt_v4_00_04\exe\swt_v4.exe ' seawatV4x64 = r'c:\modflow\swt_v4_00_04\exe\swt_v4x64.exe ' exe = seawatV4x64 swt_nam = ws + r'
eals
eal%s\ss\ss.nam_swt' % real os.system( exe + swt_nam ) if __name__ == '__main__': p = mp.Pool(processes=mp.cpu_count()-1) #-leave 1 processor available for system and other processes tasks = range(len(files)) results = [] for f in files: r = p.map_async(run(f), tasks, callback=results.append) _

_if __name__ == 'main':_によって上記のスクリプトに与えられていると感じる並列処理の欠如を修正することを期待して、_for loop_を次のように変更しました。ただし、モデルは実行すら失敗します（Pythonエラーなし）：

_if __name__ == '__main__': p = mp.Pool(processes=mp.cpu_count()-1) #-leave 1 processor available for system and other processes p.map_async(run,((files[f],) for f in range(len(files)))) _

すべての助けは大歓迎です！

編集2012年3月26日13:31 EST

@ J.Fで「手動プール」方式を使用します。以下のセバスチャンの答えは、外部.exeの並列実行を取得します。モデルの実現は、一度に8つのバッチで呼び出されますが、次のバッチを呼び出す前に、これらの8つの実行が完了するのを待ちません。

_from __future__ import print_function import os,subprocess,sys import multiprocessing as mp from Queue import Queue from threading import Thread def run(f,ws): real = f.split('_')[-1].split('.')[0] print('Realization %s' % real) seawatV4x64 = r'c:\modflow\swt_v4_00_04\exe\swt_v4x64.exe ' swt_nam = ws + r'
eals
eal%s\ss\ss.nam_swt' % real subprocess.check_call([seawatV4x64, swt_nam]) def worker(queue): """Process files from the queue.""" for args in iter(queue.get, None): try: run(*args) except Exception as e: # catch exceptions to avoid exiting the # thread prematurely print('%r failed: %s' % (args, e,), file=sys.stderr) def main(): # populate files ws = r'D:\Data\Users\jbellino\Project\stJohnsDeepening\model\xsec_a' wdir = os.path.join(ws, r'fieldgen
eals') q = Queue() for f in os.listdir(wdir): if f.endswith('.npy'): q.put_nowait((os.path.join(wdir, f), ws)) # start threads threads = [Thread(target=worker, args=(q,)) for _ in range(8)] for t in threads: t.daemon = True # threads die if the program dies t.start() for _ in threads: q.put_nowait(None) # signal no more files for t in threads: t.join() # wait for completion if __name__ == '__main__': mp.freeze_support() # optional if the program is not frozen main() _

エラートレースバックは利用できません。 run()関数は、複数のファイルの場合と同様に、単一のモデル実現ファイルに対して呼び出されたときにその役割を果たします。唯一の違いは、複数のファイルがある場合、各インスタンスがすぐに閉じて1つのモデルの実行のみが終了し、スクリプトが正常に終了する（終了コード0）にもかかわらず、len(files)回呼び出されることです。

main()にいくつかのprintステートメントを追加すると、アクティブなスレッド数とスレッドステータスに関する情報が明らかになります（これは、スクリーンショットをより管理しやすくするための8つの実現ファイル（理論的には8つのファイルすべて）に対するテストであることに注意してください同時に実行する必要がありますが、動作はスポーンされた場所で続行され、1つを除いてすぐに終了します）：

_def main(): # populate files ws = r'D:\Data\Users\jbellino\Project\stJohnsDeepening\model\xsec_a' wdir = os.path.join(ws, r'fieldgen	est') q = Queue() for f in os.listdir(wdir): if f.endswith('.npy'): q.put_nowait((os.path.join(wdir, f), ws)) # start threads threads = [Thread(target=worker, args=(q,)) for _ in range(mp.cpu_count())] for t in threads: t.daemon = True # threads die if the program dies t.start() print('Active Count a',threading.activeCount()) for _ in threads: print(_) q.put_nowait(None) # signal no more files for t in threads: print(t) t.join() # wait for completion print('Active Count b',threading.activeCount()) _

screenshot

**「_D:\Data\Users..._」と表示されている行は、モデルの実行を手動で停止して完了したときにスローされるエラー情報です。モデルの実行を停止すると、残りのスレッドステータス行が報告され、スクリプトが終了します。

編集2012年3月26日16:24 EST

SEAWATは、過去にこれを行ったように同時実行を許可し、iPythonを使用してインスタンスを手動で生成し、各モデルファイルフォルダーから起動します。今回は、すべてのモデルの実行を1つの場所、つまりスクリプトが存在するディレクトリから起動します。犯人は、SEAWATが出力の一部を保存している方法にあるようです。 SEAWATを実行すると、モデルの実行に関連するファイルがすぐに作成されます。これらのファイルの1つは、モデルの実現が配置されているディレクトリではなく、スクリプトが配置されている最上位のディレクトリに保存されています。これにより、後続のスレッドが同じファイル名を同じ場所に保存できなくなります（これらのファイル名は一般的であり、各実現に固有ではないため、すべてのスレッドが保存したいと考えています）。 SEAWATウィンドウは、私が読んだり、エラーメッセージが表示されたりするのに十分な時間開いたままではありませんでした。これに気付いたのは、戻って、SEAWATからのプリントアウトを開く代わりに直接表示するiPythonを使用してコードを実行しようとしたときだけです。プログラムを実行するための新しいウィンドウ。

@ J.Fを受け入れています。このモデル実行可能問題を解決すると、彼が提供したスレッドコードによって、必要な場所に移動できる可能性が高いため、Sebastianの回答。

最終コード

Subprocess.check_callにcwd引数を追加して、SEAWATの各インスタンスを独自のディレクトリで開始します。非常に重要です。

_from __future__ import print_function import os,subprocess,sys import multiprocessing as mp from Queue import Queue from threading import Thread import threading def run(f,ws): real = f.split('_')[-1].split('.')[0] print('Realization %s' % real) seawatV4x64 = r'c:\modflow\swt_v4_00_04\exe\swt_v4x64.exe ' cwd = ws + r'
eals
eal%s\ss' % real swt_nam = ws + r'
eals
eal%s\ss\ss.nam_swt' % real subprocess.check_call([seawatV4x64, swt_nam],cwd=cwd) def worker(queue): """Process files from the queue.""" for args in iter(queue.get, None): try: run(*args) except Exception as e: # catch exceptions to avoid exiting the # thread prematurely print('%r failed: %s' % (args, e,), file=sys.stderr) def main(): # populate files ws = r'D:\Data\Users\jbellino\Project\stJohnsDeepening\model\xsec_a' wdir = os.path.join(ws, r'fieldgen
eals') q = Queue() for f in os.listdir(wdir): if f.endswith('.npy'): q.put_nowait((os.path.join(wdir, f), ws)) # start threads threads = [Thread(target=worker, args=(q,)) for _ in range(mp.cpu_count()-1)] for t in threads: t.daemon = True # threads die if the program dies t.start() for _ in threads: q.put_nowait(None) # signal no more files for t in threads: t.join() # wait for completion if __name__ == '__main__': mp.freeze_support() # optional if the program is not frozen main() _

jfs · Accepted Answer

Pythonコードに計算がありません。複数の外部プログラムを並行して実行する必要がある場合は、subprocessを使用してプログラムを実行し、threadingモジュールを使用して一定数のプロセスを実行し続けるだけで十分ですが、最も単純なコードは_multiprocessing.Pool_を使用します。

_#!/usr/bin/env python import os import multiprocessing as mp def run(filename_def_param): filename, def_param = filename_def_param # unpack arguments ... # call external program on `filename` def safe_run(*args, **kwargs): """Call run(), catch exceptions.""" try: run(*args, **kwargs) except Exception as e: print("error: %s run(*%r, **%r)" % (e, args, kwargs)) def main(): # populate files ws = r'D:\Data\Users\jbellino\Project\stJohnsDeepening\model\xsec_a' workdir = os.path.join(ws, r'fieldgen\reals') files = ((os.path.join(workdir, f), ws) for f in os.listdir(workdir) if f.endswith('.npy')) # start processes pool = mp.Pool() # use all available CPUs pool.map(safe_run, files) if __name__=="__main__": mp.freeze_support() # optional if the program is not frozen main() _

ファイルが多い場合は、pool.map()をfor _ in pool.imap_unordered(safe_run, files): passに置き換えることができます。

_mutiprocessing.dummy.Pool_と同じインターフェイスを提供する_multiprocessing.Pool_もありますが、この場合は、プロセスの代わりにスレッドを使用します。

一部のCPUを空けておく必要はありません。実行可能ファイルを低い優先度で起動するコマンドを使用するだけです（LinuxではNiceプログラムです）。

`ThreadPoolExecutor` example

_concurrent.futures.ThreadPoolExecutor_ は単純で十分ですが、 Python 2.xへのサードパーティの依存関係（stdlibにあるのでPython 3.2）。

_#!/usr/bin/env python import os import concurrent.futures def run(filename, def_param): ... # call external program on `filename` # populate files ws = r'D:\Data\Users\jbellino\Project\stJohnsDeepening\model\xsec_a' wdir = os.path.join(ws, r'fieldgen\reals') files = (os.path.join(wdir, f) for f in os.listdir(wdir) if f.endswith('.npy')) # start threads with concurrent.futures.ThreadPoolExecutor(max_workers=8) as executor: future_to_file = dict((executor.submit(run, f, ws), f) for f in files) for future in concurrent.futures.as_completed(future_to_file): f = future_to_file[future] if future.exception() is not None: print('%r generated an exception: %s' % (f, future.exception())) # run() doesn't return anything so `future.result()` is always `None` _

または、run()によって発生した例外を無視した場合：

_from itertools import repeat ... # the same # start threads with concurrent.futures.ThreadPoolExecutor(max_workers=8) as executor: executor.map(run, files, repeat(ws)) # run() doesn't return anything so `map()` results can be ignored _

`subprocess` + `threading`（手動プール）ソリューション

_#!/usr/bin/env python from __future__ import print_function import os import subprocess import sys from Queue import Queue from threading import Thread def run(filename, def_param): ... # define exe, swt_nam subprocess.check_call([exe, swt_nam]) # run external program def worker(queue): """Process files from the queue.""" for args in iter(queue.get, None): try: run(*args) except Exception as e: # catch exceptions to avoid exiting the # thread prematurely print('%r failed: %s' % (args, e,), file=sys.stderr) # start threads q = Queue() threads = [Thread(target=worker, args=(q,)) for _ in range(8)] for t in threads: t.daemon = True # threads die if the program dies t.start() # populate files ws = r'D:\Data\Users\jbellino\Project\stJohnsDeepening\model\xsec_a' wdir = os.path.join(ws, r'fieldgen\reals') for f in os.listdir(wdir): if f.endswith('.npy'): q.put_nowait((os.path.join(wdir, f), ws)) for _ in threads: q.put_nowait(None) # signal no more files for t in threads: t.join() # wait for completion _

Vikas Gautam · Answer

これが、メモリ内の最小xスレッド数を維持するための私の方法です。スレッド化モジュールとマルチプロセッシングモジュールの組み合わせ。尊敬されている仲間のメンバーが上で説明したような他のテクニックには珍しいかもしれませんが、かなりの価値があるかもしれません。説明のために、一度に最低5つのWebサイトをクロールするシナリオを取り上げています。

だからここにあります：-

#importing dependencies. from multiprocessing import Process from threading import Thread import threading # Crawler function def crawler(domain): # define crawler technique here. output.write(scrapeddata + "
") pass

次はthreadController関数です。この関数は、メインメモリへのスレッドの流れを制御します。 threadNumの「最小」制限を維持するために、スレッドをアクティブ化し続けます。 5.また、すべてのアクティブスレッド（acitveCount）が終了するまで終了しません。

最小限のthreadNum（5）startProcess関数スレッドを維持します（これらのスレッドは、60秒のタイムアウトでそれらを結合しながら、最終的にprocessListからプロセスを開始します）。 threadControllerを見つめた後、上記の5つの制限に含まれない2つのスレッドがあります。メインスレッドとthreadControllerスレッド自体。そのため、threading.activeCount（）！= 2が使用されています。

def threadController(): print "Thread count before child thread starts is:-", threading.activeCount(), len(processList) # staring first thread. This will make the activeCount=3 Thread(target = startProcess).start() # loop while thread List is not empty OR active threads have not finished up. while len(processList) != 0 or threading.activeCount() != 2: if (threading.activeCount() < (threadNum + 2) and # if count of active threads are less than the Minimum AND len(processList) != 0): # processList is not empty Thread(target = startProcess).start() # This line would start startThreads function as a seperate thread **

startProcess関数は、別個のスレッドとして、プロセスリストからプロセスを開始します。この関数（**別のスレッドとして開始）の目的は、プロセスの親スレッドになることです。したがって、60秒のタイムアウトでそれらを結合すると、startProcessスレッドが停止して先に進みますが、threadControllerの実行は停止しません。したがって、このように、threadControllerは必要に応じて機能します。

def startProcess(): pr = processList.pop(0) pr.start() pr.join(60.00) # joining the thread with time out of 60 seconds as a float. if __name__ == '__main__': # a file holding a list of domains domains = open("Domains.txt", "r").read().split("
") output = open("test.txt", "a") processList = [] # thread list threadNum = 5 # number of thread initiated processes to be run at one time # making process List for r in range(0, len(domains), 1): domain = domains[r].strip() p = Process(target = crawler, args = (domain,)) processList.append(p) # making a list of performer threads. # starting the threadController as a seperate thread. mt = Thread(target = threadController) mt.start() mt.join() # won't let go next until threadController thread finishes. output.close() print "Done"

メモリ内のスレッドの最小数を維持することに加えて、私の目的は、メモリ内のスレッドやプロセスのスタックを回避できるものを用意することでもありました。タイムアウト機能を使用してこれを行いました。入力ミスをお詫びします。

この構造がこの世界の誰にでも役立つことを願っています。よろしく、Vikas Gautam

Pythonのマルチプロセッシングモジュールを使用して、SEAWAT / MODFLOWモデルの実行を同時に個別に実行する

ThreadPoolExecutor example

subprocess + threading（手動プール）ソリューション

`ThreadPoolExecutor` example

`subprocess` + `threading`（手動プール）ソリューション