Pythonでマルチプロセッシングを使用しているときに、どのようにログを記録する必要がありますか？

Question

現時点では、Python 2.6 multiprocessing module を使用して複数のプロセスを生成するフレームワークに中央モジュールがあります。 multiprocessingを使用するため、モジュールレベルのマルチプロセッシング対応ログLOG = multiprocessing.get_logger()があります。ドキュメントごとに、このロガーはプロセス共有ロックを備えているため、複数のプロセスが同時に書き込みを行うことでsys.stderr（または任意のファイルハンドル）で文字化けすることはありません。

私が今抱えている問題は、フレームワーク内の他のモジュールがマルチプロセッシングに対応していないことです。私の考えでは、この中央モジュールへのすべての依存関係でマルチプロセッシング対応のロギングを使用する必要があります。フレームワーク内のすべてのクライアントにとってはもちろん、フレームワーク内では迷惑です。私が考えていない代替手段はありますか？

vladr · Accepted Answer

これを非侵入的に処理する唯一の方法は次のとおりです。

ログが異なるファイル記述子（ディスクまたはパイプ）に移動するように各ワーカープロセスを生成します。理想的には、すべてのログエントリにタイムスタンプを付ける必要があります。
コントローラープロセスは、次のいずれかを実行できます：
- ディスクファイルを使用する場合：実行の最後にログファイルを結合し、タイムスタンプでソートします
- パイプを使用する場合（推奨）：すべてのパイプからオンザフライでログエントリを中央ログファイルに結合します。（たとえば、定期的に select パイプのファイル記述子から、使用可能なログエントリでマージソートを実行し、集中ログにフラッシュします。繰り返します。）

zzzeek · Answer

パイプを介してすべてを親プロセスにフィードするだけのログハンドラを作成しました。私はそれを10分間だけテストしてきましたが、かなりうまくいくようです。

（注：これはRotatingFileHandlerにハードコードされています。これは、私自身の使用例です。）

更新：@javierはこのアプローチをPypiで利用可能なパッケージとして維持するようになりました-Pypiで multiprocessing-logging を参照してください https://github.com/jruere/multiprocessing-logging =

更新：実装！

これは、並行処理を正しく処理するためにキューを使用するようになり、エラーから正しく回復するようになりました。私はこれを数か月間実稼働環境で使用しており、現在のバージョンは問題なく機能しています。

from logging.handlers import RotatingFileHandler import multiprocessing, threading, logging, sys, traceback class MultiProcessingLog(logging.Handler): def __init__(self, name, mode, maxsize, rotate): logging.Handler.__init__(self) self._handler = RotatingFileHandler(name, mode, maxsize, rotate) self.queue = multiprocessing.Queue(-1) t = threading.Thread(target=self.receive) t.daemon = True t.start() def setFormatter(self, fmt): logging.Handler.setFormatter(self, fmt) self._handler.setFormatter(fmt) def receive(self): while True: try: record = self.queue.get() self._handler.emit(record) except (KeyboardInterrupt, SystemExit): raise except EOFError: break except: traceback.print_exc(file=sys.stderr) def send(self, s): self.queue.put_nowait(s) def _format_record(self, record): # ensure that exc_info and args # have been stringified. Removes any chance of # unpickleable things inside and possibly reduces # message size sent over the pipe if record.args: record.msg = record.msg % record.args record.args = None if record.exc_info: dummy = self.format(record) record.exc_info = None return record def emit(self, record): try: s = self._format_record(record) self.send(s) except (KeyboardInterrupt, SystemExit): raise except: self.handleError(record) def close(self): self._handler.close() logging.Handler.close(self)

fantabolous · Answer

pythonロギングクックブックには、2つの完全な例があります。 https://docs.python.org/3/howto/logging-cookbook.html#logging-to-a-single-file- from-multiple-processes

QueueHandlerを使用します。これはpython 3.2の新機能ですが、独自のコードに簡単にコピーできます（python 2.7で自分がやったように）： https ：//Gist.github.com/vsajip/591589

各プロセスはQueueにログを記録し、次にlistenerスレッドまたはプロセス（それぞれに1つの例を示します）がそれらを選択し、それらをすべてファイルに書き込みます-破損や文字化けのリスクはありません。

Ali Afshar · Answer

さらに別の選択肢は、 logging package のさまざまな非ファイルベースのロギングハンドラーです。

SocketHandler
DatagramHandler
SyslogHandler

（その他）

このように、安全に書き込むことができ、結果を正しく処理できる場所にロギングデーモンを簡単に作成できます。（たとえば、メッセージのピクルを外し、独自の回転ファイルハンドラーに送信する単純なソケットサーバー）

SyslogHandlerもこれを処理します。もちろん、システムのインスタンスではなく、独自のsyslogのインスタンスを使用できます。

user2133814 · Answer

以下は、Googleからここに来る他の人（私のような人）のシンプルさに焦点を当てた別のソリューションです。ロギングは簡単なはずです！ 3.2以降のみ。

import multiprocessing import logging from logging.handlers import QueueHandler, QueueListener import time import random def f(i): time.sleep(random.uniform(.01, .05)) logging.info('function called with {} in worker thread.'.format(i)) time.sleep(random.uniform(.01, .05)) return i def worker_init(q): # all records from worker processes go to qh and then into q qh = QueueHandler(q) logger = logging.getLogger() logger.setLevel(logging.DEBUG) logger.addHandler(qh) def logger_init(): q = multiprocessing.Queue() # this is the handler for all log records handler = logging.StreamHandler() handler.setFormatter(logging.Formatter("%(levelname)s: %(asctime)s - %(process)s - %(message)s")) # ql gets records from the queue and sends them to the handler ql = QueueListener(q, handler) ql.start() logger = logging.getLogger() logger.setLevel(logging.DEBUG) # add the handler to the logger so records from this process are handled logger.addHandler(handler) return ql, q def main(): q_listener, q = logger_init() logging.info('hello from main thread') pool = multiprocessing.Pool(4, worker_init, [q]) for result in pool.map(f, range(10)): pass pool.close() pool.join() q_listener.stop() if __== '__main__': main()

ironhacker · Answer

ロギングスレッドとキュースレッドを別々に保つ他のバリアント。

"""sample code for logging in subprocesses using multiprocessing * Little handler magic - The main process uses loggers and handlers as normal. * Only a simple handler is needed in the subprocess that feeds the queue. * Original logger name from subprocess is preserved when logged in main process. * As in the other implementations, a thread reads the queue and calls the handlers. Except in this implementation, the thread is defined outside of a handler, which makes the logger definitions simpler. * Works with multiple handlers. If the logger in the main process defines multiple handlers, they will all be fed records generated by the subprocesses loggers. tested with Python 2.5 and 2.6 on Linux and Windows """ import os import sys import time import traceback import multiprocessing, threading, logging, sys DEFAULT_LEVEL = logging.DEBUG formatter = logging.Formatter("%(levelname)s: %(asctime)s - %(name)s - %(process)s - %(message)s") class SubProcessLogHandler(logging.Handler): """handler used by subprocesses It simply puts items on a Queue for the main process to log. """ def __init__(self, queue): logging.Handler.__init__(self) self.queue = queue def emit(self, record): self.queue.put(record) class LogQueueReader(threading.Thread): """thread to write subprocesses log records to main process log This thread reads the records written by subprocesses and writes them to the handlers defined in the main process's handlers. """ def __init__(self, queue): threading.Thread.__init__(self) self.queue = queue self.daemon = True def run(self): """read from the queue and write to the log handlers The logging documentation says logging is thread safe, so there shouldn't be contention between normal logging (from the main process) and this thread. Note that we're using the name of the original logger. """ # Thanks Mike for the error checking code. while True: try: record = self.queue.get() # get the logger for this record logger = logging.getLogger(record.name) logger.callHandlers(record) except (KeyboardInterrupt, SystemExit): raise except EOFError: break except: traceback.print_exc(file=sys.stderr) class LoggingProcess(multiprocessing.Process): def __init__(self, queue): multiprocessing.Process.__init__(self) self.queue = queue def _setupLogger(self): # create the logger to use. logger = logging.getLogger('test.subprocess') # The only handler desired is the SubProcessLogHandler. If any others # exist, remove them. In this case, on Unix and Linux the StreamHandler # will be inherited. for handler in logger.handlers: # just a check for my sanity assert not isinstance(handler, SubProcessLogHandler) logger.removeHandler(handler) # add the handler handler = SubProcessLogHandler(self.queue) handler.setFormatter(formatter) logger.addHandler(handler) # On Windows, the level will not be inherited. Also, we could just # set the level to log everything here and filter it in the main # process handlers. For now, just set it from the global default. logger.setLevel(DEFAULT_LEVEL) self.logger = logger def run(self): self._setupLogger() logger = self.logger # and here goes the logging p = multiprocessing.current_process() logger.info('hello from process %s with pid %s' % (p.name, p.pid)) if __== '__main__': # queue used by the subprocess loggers queue = multiprocessing.Queue() # Just a normal logger logger = logging.getLogger('test') handler = logging.StreamHandler() handler.setFormatter(formatter) logger.addHandler(handler) logger.setLevel(DEFAULT_LEVEL) logger.info('hello from the main process') # This thread will read from the subprocesses and write to the main log's # handlers. log_queue_reader = LogQueueReader(queue) log_queue_reader.start() # create the processes. for i in range(10): p = LoggingProcess(queue) p.start() # The way I read the multiprocessing warning about Queue, joining a # process before it has finished feeding the Queue can cause a deadlock. # Also, Queue.empty() is not realiable, so just make sure all processes # are finished. # active_children joins subprocesses when they're finished. while multiprocessing.active_children(): time.sleep(.1)

schlamar · Answer

現在のすべてのソリューションは、ハンドラーを使用してロギング構成に結合されすぎています。私のソリューションには、次のアーキテクチャと機能があります。

使用できるanyロギング構成
ロギングはデーモンスレッドで行われます
コンテキストマネージャーを使用したデーモンの安全なシャットダウン
ロギングスレッドへの通信はmultiprocessing.Queueによって行われます
サブプロセスでは、logging.Logger（および定義済みのインスタンス）にパッチを適用して、allレコードをキューに送信します
New：キューに送信する前にトレースバックとメッセージをフォーマットして、酸洗いエラーを防止します

使用例と出力を含むコードは、次のGistにあります。 https://Gist.github.com/schlamar/7003737

Samuel · Answer

多くのパブリッシャーと1つのサブスクライバー（リスナー）としてマルチプロセスロギングを表現できるため、 ZeroMQ を使用してPUB-SUBメッセージングを実装することは確かにオプションです。

さらに、ZMQのPythonバインディングである PyZMQ モジュールは、zmq.PUBソケットを介してログメッセージを発行するためのオブジェクトである PUBHandler を実装します。

Web上のソリューションがあります。これは、PyZMQとPUBHandlerを使用した分散アプリケーションからの集中ロギング用で、複数の発行プロセスでローカルに作業するために簡単に採用できます。

formatters = { logging.DEBUG: logging.Formatter("[%(name)s] %(message)s"), logging.INFO: logging.Formatter("[%(name)s] %(message)s"), logging.WARN: logging.Formatter("[%(name)s] %(message)s"), logging.ERROR: logging.Formatter("[%(name)s] %(message)s"), logging.CRITICAL: logging.Formatter("[%(name)s] %(message)s") } # This one will be used by publishing processes class PUBLogger: def __init__(self, Host, port=config.PUBSUB_LOGGER_PORT): self._logger = logging.getLogger(__name__) self._logger.setLevel(logging.DEBUG) self.ctx = zmq.Context() self.pub = self.ctx.socket(zmq.PUB) self.pub.connect('tcp://{0}:{1}'.format(socket.gethostbyname(Host), port)) self._handler = PUBHandler(self.pub) self._handler.formatters = formatters self._logger.addHandler(self._handler) @property def logger(self): return self._logger # This one will be used by listener process class SUBLogger: def __init__(self, ip, output_dir="", port=config.PUBSUB_LOGGER_PORT): self.output_dir = output_dir self._logger = logging.getLogger() self._logger.setLevel(logging.DEBUG) self.ctx = zmq.Context() self._sub = self.ctx.socket(zmq.SUB) self._sub.bind('tcp://*:{1}'.format(ip, port)) self._sub.setsockopt(zmq.SUBSCRIBE, "") handler = handlers.RotatingFileHandler(os.path.join(output_dir, "client_debug.log"), "w", 100 * 1024 * 1024, 10) handler.setLevel(logging.DEBUG) formatter = logging.Formatter("%(asctime)s;%(levelname)s - %(message)s") handler.setFormatter(formatter) self._logger.addHandler(handler) @property def sub(self): return self._sub @property def logger(self): return self._logger # And that's the way we actually run things: # Listener process will forever listen on SUB socket for incoming messages def run_sub_logger(ip, event): sub_logger = SUBLogger(ip) while not event.is_set(): try: topic, message = sub_logger.sub.recv_multipart(flags=zmq.NOBLOCK) log_msg = getattr(logging, topic.lower()) log_msg(message) except zmq.ZMQError as zmq_error: if zmq_error.errno == zmq.EAGAIN: pass # Publisher processes loggers should be initialized as follows: class Publisher: def __init__(self, stop_event, proc_id): self.stop_event = stop_event self.proc_id = proc_id self._logger = pub_logger.PUBLogger('127.0.0.1').logger def run(self): self._logger.info("{0} - Sending message".format(proc_id)) def run_worker(event, proc_id): worker = Publisher(event, proc_id) worker.run() # Starting subscriber process so we won't loose publisher's messages sub_logger_process = Process(target=run_sub_logger, args=('127.0.0.1'), stop_event,)) sub_logger_process.start() #Starting publisher processes for i in range(MAX_WORKERS_PER_CLIENT): processes.append(Process(target=run_worker, args=(stop_event, i,))) for p in processes: p.start()

Mike Miller · Answer

私もzzzeekの答えが好きですが、Andreは、文字化けを防ぐためにキューが必要であることは正しいです。私はパイプに多少の幸運があったが、いくらか期待されている文字化けが見られた。それを実装することは、特にグローバル変数やその他のものに関するいくつかの追加の制限があるWindows上で実行するため、思ったよりも困難であることが判明しました（参照： WindowsでのPython Multiprocessing Implemented？）

しかし、ようやく機能しました。この例はおそらく完璧ではないので、コメントや提案を歓迎します。また、フォーマッターやルートロガー以外の設定はサポートしていません。基本的に、キューを持つ各プールプロセスでロガーを再起動し、ロガーに他の属性を設定する必要があります。

繰り返しますが、コードを改善する方法に関する提案は歓迎します。私は確かにPythonのすべてのトリックをまだ知らない:-)

import multiprocessing, logging, sys, re, os, StringIO, threading, time, Queue class MultiProcessingLogHandler(logging.Handler): def __init__(self, handler, queue, child=False): logging.Handler.__init__(self) self._handler = handler self.queue = queue # we only want one of the loggers to be pulling from the queue. # If there is a way to do this without needing to be passed this # information, that would be great! if child == False: self.shutdown = False self.polltime = 1 t = threading.Thread(target=self.receive) t.daemon = True t.start() def setFormatter(self, fmt): logging.Handler.setFormatter(self, fmt) self._handler.setFormatter(fmt) def receive(self): #print "receive on" while (self.shutdown == False) or (self.queue.empty() == False): # so we block for a short period of time so that we can # check for the shutdown cases. try: record = self.queue.get(True, self.polltime) self._handler.emit(record) except Queue.Empty, e: pass def send(self, s): # send just puts it in the queue for the server to retrieve self.queue.put(s) def _format_record(self, record): ei = record.exc_info if ei: dummy = self.format(record) # just to get traceback text into record.exc_text record.exc_info = None # to avoid Unpickleable error return record def emit(self, record): try: s = self._format_record(record) self.send(s) except (KeyboardInterrupt, SystemExit): raise except: self.handleError(record) def close(self): time.sleep(self.polltime+1) # give some time for messages to enter the queue. self.shutdown = True time.sleep(self.polltime+1) # give some time for the server to time out and see the shutdown def __del__(self): self.close() # hopefully this aids in orderly shutdown when things are going poorly. def f(x): # just a logging command... logging.critical('function number: ' + str(x)) # to make some calls take longer than others, so the output is "jumbled" as real MP programs are. time.sleep(x % 3) def initPool(queue, level): """ This causes the logging module to be initialized with the necessary info in pool threads to work correctly. """ logging.getLogger('').addHandler(MultiProcessingLogHandler(logging.StreamHandler(), queue, child=True)) logging.getLogger('').setLevel(level) if __== '__main__': stream = StringIO.StringIO() logQueue = multiprocessing.Queue(100) handler= MultiProcessingLogHandler(logging.StreamHandler(stream), logQueue) logging.getLogger('').addHandler(handler) logging.getLogger('').setLevel(logging.DEBUG) logging.debug('starting main') # when bulding the pool on a Windows machine we also have to init the logger in all the instances with the queue and the level of logging. pool = multiprocessing.Pool(processes=10, initializer=initPool, initargs=[logQueue, logging.getLogger('').getEffectiveLevel()] ) # start worker processes pool.map(f, range(0,50)) pool.close() logging.debug('done') logging.shutdown() print "stream output is:" print stream.getvalue()

Javier · Answer

ロガーのインスタンスをどこかに公開するだけです。そうすれば、他のモジュールとクライアントはAPIを使用して、import multiprocessingなしでロガーを取得できます。

Andr&#233; Cruz · Answer

私はzzzeekの答えが好きでした。複数のスレッド/プロセスが同じパイプエンドを使用してログメッセージを生成すると、それらが文字化けするため、キューの代わりにパイプを使用します。

Sawan · Answer

すべてのログをキューからすべてのログエントリを読み取る別のプロセスに委任するのはどうですか？

LOG_QUEUE = multiprocessing.JoinableQueue() class CentralLogger(multiprocessing.Process): def __init__(self, queue): multiprocessing.Process.__init__(self) self.queue = queue self.log = logger.getLogger('some_config') self.log.info("Started Central Logging process") def run(self): while True: log_level, message = self.queue.get() if log_level is None: self.log.info("Shutting down Central Logging process") break else: self.log.log(log_level, message) central_logger_process = CentralLogger(LOG_QUEUE) central_logger_process.start()

LOG_QUEUEをマルチプロセスメカニズムまたは継承のいずれかで共有するだけで、すべてうまくいきます！

nmz787 · Answer

ここに私の簡単なハック/回避策があります...最も包括的なものではありませんが、これを書く前に見つけた他の答えよりも簡単に変更可能で読みやすく理解しやすいと思います：

import logging import multiprocessing class FakeLogger(object): def __init__(self, q): self.q = q def info(self, item): self.q.put('INFO - {}'.format(item)) def debug(self, item): self.q.put('DEBUG - {}'.format(item)) def critical(self, item): self.q.put('CRITICAL - {}'.format(item)) def warning(self, item): self.q.put('WARNING - {}'.format(item)) def some_other_func_that_gets_logger_and_logs(num): # notice the name get's discarded # of course you can easily add this to your FakeLogger class local_logger = logging.getLogger('local') local_logger.info('Hey I am logging this: {} and working on it to make this {}!'.format(num, num*2)) local_logger.debug('hmm, something may need debugging here') return num*2 def func_to_parallelize(data_chunk): # unpack our args the_num, logger_q = data_chunk # since we're now in a new process, let's monkeypatch the logging module logging.getLogger = lambda name=None: FakeLogger(logger_q) # now do the actual work that happens to log stuff too new_num = some_other_func_that_gets_logger_and_logs(the_num) return (the_num, new_num) if __== '__main__': multiprocessing.freeze_support() m = multiprocessing.Manager() logger_q = m.Queue() # we have to pass our data to be parallel-processed # we also need to pass the Queue object so we can retrieve the logs parallelable_data = [(1, logger_q), (2, logger_q)] # set up a pool of processes so we can take advantage of multiple CPU cores pool_size = multiprocessing.cpu_count() * 2 pool = multiprocessing.Pool(processes=pool_size, maxtasksperchild=4) worker_output = pool.map(func_to_parallelize, parallelable_data) pool.close() # no more tasks pool.join() # wrap up current tasks # get the contents of our FakeLogger object while not logger_q.empty(): print logger_q.get() print 'worker output contained: {}'.format(worker_output)

user6336812 · Answer

以下はWindows環境で使用できるクラスで、ActivePythonが必要です。他のロギングハンドラー（StreamHandlerなど）を継承することもできます。

class SyncronizedFileHandler(logging.FileHandler): MUTEX_NAME = 'logging_mutex' def __init__(self , *args , **kwargs): self.mutex = win32event.CreateMutex(None , False , self.MUTEX_NAME) return super(SyncronizedFileHandler , self ).__init__(*args , **kwargs) def emit(self, *args , **kwargs): try: win32event.WaitForSingleObject(self.mutex , win32event.INFINITE) ret = super(SyncronizedFileHandler , self ).emit(*args , **kwargs) finally: win32event.ReleaseMutex(self.mutex) return ret

そして、これは使用法を示す例です：

import logging import random , time , os , sys , datetime from string import letters import win32api , win32event from multiprocessing import Pool def f(i): time.sleep(random.randint(0,10) * 0.1) ch = random.choice(letters) logging.info( ch * 30) def init_logging(): ''' initilize the loggers ''' formatter = logging.Formatter("%(levelname)s - %(process)d - %(asctime)s - %(filename)s - %(lineno)d - %(message)s") logger = logging.getLogger() logger.setLevel(logging.INFO) file_handler = SyncronizedFileHandler(sys.argv[1]) file_handler.setLevel(logging.INFO) file_handler.setFormatter(formatter) logger.addHandler(file_handler) #must be called in the parent and in every worker process init_logging() if __== '__main__': #multiprocessing stuff pool = Pool(processes=10) imap_result = pool.imap(f , range(30)) for i , _ in enumerate(imap_result): pass

Richard Jones · Answer

私のコードの一部でlogging.exceptionを使用し、トレースバックがpickle 'できないためキューに戻す前に例外をフォーマットする必要があることを除いて、私はironhackerに似たソリューションを持っています：

class QueueHandler(logging.Handler): def __init__(self, queue): logging.Handler.__init__(self) self.queue = queue def emit(self, record): if record.exc_info: # can't pass exc_info across processes so just format now record.exc_text = self.formatException(record.exc_info) record.exc_info = None self.queue.put(record) def formatException(self, ei): sio = cStringIO.StringIO() traceback.print_exception(ei[0], ei[1], ei[2], None, sio) s = sio.getvalue() sio.close() if s[-1] == "
": s = s[:-1] return s

Albert · Answer

loggingモジュールのロック、スレッド、フォークの組み合わせでデッドロックが発生している場合は、バグレポート6721 で報告されます（ related SO質問）。

here に投稿された小さな修正ソリューションがあります。

ただし、それはloggingの潜在的なデッドロックを修正するだけです。それは物事が多分文字化けすることを修正するものではありません。ここに提示されている他の回答を参照してください。

cdleary · Answer

代替方法の1つは、マルチプロセッシングロギングを既知のファイルに書き込み、atexitハンドラーを登録して、それらのプロセスに参加してstderrでそれを読み戻すことです。ただし、そのようにstderrの出力メッセージにリアルタイムのフローを取得することはできません。

juan Isaza · Answer

この素晴らしいパッケージがあります

パッケージ： https://pypi.python.org/pypi/multiprocessing-logging/

コード： https://github.com/jruere/multiprocessing-logging

インストール：

pip install multiprocessing-logging

それから加えて：

import multiprocessing_logging # This enables logs inside process multiprocessing_logging.install_mp_handler()