C ++のダブルバッファを備えたシングルプロデューサ、シングルコンシューマのデータ構造

Question

$ workに、異なる頻度でスケジュールされている2つのリアルタイムスレッド間を移動する必要があるアプリケーションがあります。（実際のスケジューリングは私の制御を超えています。）アプリケーションはハードリアルタイムっぽい（スレッドの1つがハードウェアインターフェイスを駆動する必要がある）ので、スレッド間のデータ転送はロックフリーで待機フリーである必要があります。可能な範囲。

転送する必要があるのは1ブロックのデータのみであることに注意することが重要です。2つのスレッドは異なる速度で実行されるため、遅いスレッドの2つのウェイクアップの間に速いスレッドの2つの反復が完了する場合があります。この場合、遅いスレッドが最新のデータのみを取得するように、書き込みバッファー内のデータを上書きしても問題ありません。

つまり、キューの代わりに、ダブルバッファソリューションで十分です。 2つのバッファーは初期化中に割り当てられ、リーダースレッドと書き込みスレッドはクラスのメソッドを呼び出して、これらのバッファーの1つへのポインターを取得できます。

C++コード：

#include <mutex> template <typename T> class ProducerConsumerDoubleBuffer { public: ProducerConsumerDoubleBuffer() { m_write_busy = false; m_read_idx = m_write_idx = 0; } ~ProducerConsumerDoubleBuffer() { } // The writer thread using this class must call // start_writing() at the start of its iteration // before doing anything else to get the pointer // to the current write buffer. T * start_writing(void) { std::lock_guard<std::mutex> lock(m_mutex); m_write_busy = true; m_write_idx = 1 - m_read_idx; return &m_buf[m_write_idx]; } // The writer thread must call end_writing() // as the last thing it does // to release the write busy flag. void end_writing(void) { std::lock_guard<std::mutex> lock(m_mutex); m_write_busy = false; } // The reader thread must call start_reading() // at the start of its iteration to get the pointer // to the current read buffer. // If the write thread is not active at this time, // the read buffer pointer will be set to the // (previous) write buffer - so the reader gets the latest data. // If the write buffer is busy, the read pointer is not changed. // In this case the read buffer may contain stale data, // it is up to the user to deal with this case. T * start_reading(void) { std::lock_guard<std::mutex> lock(m_mutex); if (!m_write_busy) { m_read_idx = m_write_idx; } return &m_buf[m_read_idx]; } // The reader thread must call end_reading() // at the end of its iteration. void end_reading(void) { std::lock_guard<std::mutex> lock(m_mutex); m_read_idx = m_write_idx; } private: T m_buf[2]; bool m_write_busy; unsigned int m_read_idx, m_write_idx; std::mutex m_mutex; };

リーダースレッドの古いデータを回避するために、ペイロード構造はバージョン管理されています。スレッド間の双方向のデータ転送を容易にするために、上記の怪物の2つのインスタンスが反対方向に使用されます。

質問：

このスキームはスレッドセーフですか？壊れたらどこ？
ミューテックスなしで実行できますか？おそらく、メモリバリアまたはCAS命令だけでしょうか？
それをより良くすることはできますか？

Cameron · Accepted Answer

非常に興味深い問題です！私が最初に思ったよりもずっとトリッキーです:-)私はロックフリーのソリューションが好きなので、以下の1つを試してみました。

このシステムについて考える方法はたくさんあります。固定サイズの循環バッファ/キュー（2つのエントリ）としてモデル化できますが、コンシューマーが最後に読み取りを開始したかどうかがわからないため、次に使用可能なコンシューマー値を更新できなくなります。公開された値、またはまだ（潜在的に）前の値を読んでいます。したがって、より最適なソリューションに到達するには、標準のリングバッファの状態を超える追加の状態が必要です。

最初に、プロデューサーが任意の時点で安全に書き込むことができるセルが常に存在することに注意してください。一方のセルがコンシューマーによって読み取られている場合、もう一方のセルに書き込むことができます。安全に「アクティブ」セルに書き込むことができるセルを呼び出しましょう（潜在的に読み取ることができるセルは、アクティブなセルであるではないです）。アクティブセルは、他のセルが現在読み取られていない場合にのみ切り替えることができます。

常に書き込むことができるアクティブセルとは異なり、非アクティブセルは、値が含まれている場合にのみ読み取ることができます。その値が消費されると、それはなくなります。（これは、積極的なプロデューサーの場合、ライブロックが回避されることを意味します。ある時点で、コンシューマーはセルを空にし、セルへのタッチを停止します。それが発生すると、プロデューサーは確実に値を公開できますが、その時点より前は、コンシューマーが読み取りの途中でない場合にのみ、値を公開（アクティブセルを変更）できます。）

is消費する準備ができている値がある場合、そのファクトを変更できるのはコンシューマーのみです（とにかく非アクティブセルの場合）。後続のプロダクションでは、アクティブなセルと公開された値が変更される可能性がありますが、値は、消費されるまで常に読み取る準備ができています。

プロデューサーがアクティブセルへの書き込みを完了すると、コンシューマーが他のセルを読み取っている最中でない限り、どのセルがアクティブセルであるかを変更する（インデックスを交換する）ことで、この値を「公開」できます。コンシューマーが他のセルの読み取り中にisの場合、スワップは発生しませんが、その場合、コンシューマーはit's値の読み取りが完了した後にスワップできます（プロデューサーが提供する場合）。書き込みの途中ではありません（書き込みの途中である場合、プロデューサーは書き込みが完了するとスワップします）。実際、一般に、コンシューマーによる誤ったスワップは無害であるため、コンシューマーは読み取りが完了した後はいつでもスワップできます（システムにアクセスするのがそれだけの場合）。他のセルに何かがある場合、スワップによってそれが読み取られます。次に、ない場合、スワッピングは何の影響も及ぼしません。

したがって、アクティブセルが何であるかを追跡するための共有変数が必要です。また、プロデューサーとコンシューマーの両方が操作の途中であるかどうかを示す方法も必要です。これらの3つの状態を1つのアトミック変数に格納して、それらすべてに一度に（アトミックに）影響を与えることができます。また、コンシューマーが最初に非アクティブセルに何かがあるかどうかを確認し、両方のスレッドがその状態を適切に変更する方法も必要です。他のいくつかのアプローチを試しましたが、最終的に最も簡単なのは、この情報を他のアトミック変数にも含めることでした。これにより、システム内のすべての状態変化がこのようにアトミックであるため、推論がはるかに簡単になります。

私は待機なしの実装を考え出しました（ロックなしで、すべての操作は限られた数の命令で完了します）。

コードタイム！

#include <atomic> #include <cstdint> template <typename T> class ProducerConsumerDoubleBuffer { public: ProducerConsumerDoubleBuffer() : m_state(0) { } ~ProducerConsumerDoubleBuffer() { } // Never returns nullptr T* start_writing() { // Increment active users; once we do this, no one // can swap the active cell on us until we're done auto state = m_state.fetch_add(0x2, std::memory_order_relaxed); return &m_buf[state & 1]; } void end_writing() { // We want to swap the active cell, but only if we were the last // ones concurrently accessing the data (otherwise the consumer // will do it for us when *it's* done accessing the data) auto state = m_state.load(std::memory_order_relaxed); std::uint32_t flag = (8 << (state & 1)) ^ (state & (8 << (state & 1))); state = m_state.fetch_add(flag - 0x2, std::memory_order_release) + flag - 0x2; if ((state & 0x6) == 0) { // The consumer wasn't in the middle of a read, we should // swap (unless the consumer has since started a read or // already swapped or read a value and is about to swap). // If we swap, we also want to clear the full flag on what // will become the active cell, otherwise the consumer could // eventually read two values out of order (it reads a new // value, then swaps and reads the old value while the // producer is idle). m_state.compare_exchange_strong(state, (state ^ 0x1) & ~(0x10 >> (state & 1)), std::memory_order_release); } } // Returns nullptr if there appears to be no more data to read yet T* start_reading() { m_readState = m_state.load(std::memory_order_relaxed); if ((m_readState & (0x10 >> (m_readState & 1))) == 0) { // Nothing to read here! return nullptr; } // At this point, there is guaranteed to be something to // read, because the full flag is never turned off by the // producer thread once it's on; the only thing that could // happen is that the active cell changes, but that can // only happen after the producer wrote a value into it, // in which case there's still a value to read, just in a // different cell. m_readState = m_state.fetch_add(0x2, std::memory_order_acquire) + 0x2; // Now that we've incremented the user count, nobody can swap until // we decrement it return &m_buf[(m_readState & 1) ^ 1]; } void end_reading() { if ((m_readState & (0x10 >> (m_readState & 1))) == 0) { // There was nothing to read; shame to repeat this // check, but if these functions are inlined it might // not matter. Otherwise the API could be changed. // Or just don't call this method if start_reading() // returns nullptr -- then you could also get rid // of m_readState. return; } // Alright, at this point the active cell cannot change on // us, but the active cell's flag could change and the user // count could change. We want to release our user count // and remove the flag on the value we read. auto state = m_state.load(std::memory_order_relaxed); std::uint32_t sub = (0x10 >> (state & 1)) | 0x2; state = m_state.fetch_sub(sub, std::memory_order_relaxed) - sub; if ((state & 0x6) == 0 && (state & (0x8 << (state & 1))) == 1) { // Oi, we were the last ones accessing the data when we released our cell. // That means we should swap, but only if the producer isn't in the middle // of producing something, and hasn't already swapped, and hasn't already // set the flag we just reset (which would mean they swapped an even number // of times). Note that we don't bother swapping if there's nothing to read // in the other cell. m_state.compare_exchange_strong(state, state ^ 0x1, std::memory_order_relaxed); } } private: T m_buf[2]; // The bottom (lowest) bit will be the active cell (the one for writing). // The active cell can only be switched if there's at most one concurrent // user. The next two bits of state will be the number of concurrent users. // The fourth bit indicates if there's a value available for reading // in m_buf[0], and the fifth bit has the same meaning but for m_buf[1]. std::atomic<std::uint32_t> m_state; std::uint32_t m_readState; };

セマンティクスは、コンシューマーが特定の値を2回読み取ることができないようになっており、読み取る値は常に最後に読み取った値よりも新しいことに注意してください。また、メモリ使用量もかなり効率的です（元のソリューションのように2つのバッファ）。 CASループは、競合下の単一のアトミック操作よりも一般的に効率が低いため、回避しました。

上記のコードを使用することにした場合は、最初に包括的な（スレッド化された）単体テストを作成することをお勧めします。そして適切なベンチマーク。私はそれをテストしましたが、ほんのわずかです。バグを見つけたら教えてください:-)

私のユニットテスト：

ProducerConsumerDoubleBuffer<int> buf; std::thread producer([&]() { for (int i = 0; i != 500000; ++i) { int* item = buf.start_writing(); if (item != nullptr) { // Always true *item = i; } buf.end_writing(); } }); std::thread consumer([&]() { int prev = -1; for (int i = 0; i != 500000; ++i) { int* item = buf.start_reading(); if (item != nullptr) { assert(*item > prev); prev = *item; } buf.end_reading(); } }); producer.join(); consumer.join();

元の実装については、ざっと見ただけですが（新しいものを設計する方がはるかに楽しいです）、david.pfxの回答はあなたの質問のその部分に対処しているようです。

david.pfx · Answer

はい、壊れていると思います。

リーダーが開始/終了/開始を連続して実行すると、読み取りインデックスが書き込みインデックスに更新され、書き込みがビジーであっても、書き込みインデックスからデータが読み取られる可能性があります。

問題は本質的に、ライターがリーダーが使用するバッファーを知らないことです。したがって、ライターは両方のバッファーが常に有効であることを確認する必要があります。バッファにデータを書き込むのに時間がかかる場合は、それを行うことはできません[ここに示されていないロジックの一部を誤解していない限り]。

はい、CASまたは同等のロジックを使用して、ロックなしで実行できると思います。この空間でアルゴリズムを表現しようとはしません。私はそれが存在することを確信していますが、最初に正しく書き出すことができるとは限りません。そして、少しのウェブ検索で、いくつかのもっともらしい候補が見つかりました。待機なしIPC CASの使用は非常に興味深いトピックであり、いくつかの研究の主題であるように思われます。

さらに考えてみると、アルゴリズムは次のようになります。必要なもの：

3つのバッファー：1つはライター用、1つはリーダーが使用するため、もう1つは追加のバッファーです。バッファは順序付けられています。それらはリングを形成します（ただし、注を参照）。
各バッファのステータス：空き、フル、書き込み、読み取り。
バッファのステータスを検査し、条件付きでステータスを別の値に1回のアトミック操作で変更できる関数。そのためにCSETを使用します。

ライター：

Find the first buffer that is FREE or FULL Fail: assert (should never fail, reader can only use one buffer) CSET buffer to WRITING Write into the buffer CSET buffer to FULL

読者：

Find first buffer that is FULL Fail: wait (writer may be slow) CSET buffer to READING Read and consume buffer CSET buffer to FREE

注：このアルゴリズムは、バッファーが到着順に厳密に処理されることを保証するものではなく、単純な変更によって処理されることはありません。これが重要な場合は、リーダーが最新のバッファーを選択できるように、ライターが設定したバッファーのシーケンス番号を使用してアルゴリズムを拡張する必要があります。

コードは実装の詳細として残しておきます。

CSET関数は重要です。特定の共有メモリの場所が期待値と等しいことをアトミックにテストし、等しい場合は新しい値に変更する必要があります。変更が正常に行われた場合はtrueを返し、それ以外の場合はfalseを返します。 2つのスレッドが同時に（場合によっては異なるプロセッサ上で）同じ場所にアクセスする場合、実装は競合状態を回避する必要があります。

C++標準アトミック操作ライブラリには、可能な場合は目的を果たす必要がある一連のatomic_compare_exchange関数が含まれています。

BitTickler · Answer

ここでは、InterlockedExchangePointer()とSLISTを使用したバージョンです。

このソリューションは、最後のバッファーの再読み取りをサポートしていません。ただし、それが必要な場合は、コピーとif( NULL == doubleBuffer.beginReader(...) ) { use backup copy ... }を使用してリーダー側で実行できます。
追加するのが難しいため、これは行われませんが、あまり現実的ではないためです。最後の既知の値がどんどん古くなっていくと想像してみてください-秒、日、週。アプリケーションがそれを使用したいと思う可能性はほとんどありません。したがって、再読み取り機能をダブルバッファコードに組み込むと、アプリケーションの柔軟性が失われます。

ダブルバッファには、1つの読み取りポインタメンバーがあります。 beginRead（）が呼び出されるたびに、この値が返され、アトミックにNULLに置き換えられます。「リーダーがバッファを取得する」と考えてください。
endRead()を使用すると、リーダーはバッファーを返し、書き込み操作に使用可能なバッファーを含むSLISTに追加されます。

最初に、両方のバッファーがSLISTに追加され、読み取りポインターはNULLになります。

beginWrite()は、SLISTから次に使用可能なバッファーをポップします。また、endWrite()の実装方法により、この値をNULLにすることはできません。

最後に、endWrite()は、読み取られたポインターを、返された新しく書き込まれたバッファーとアトミックに交換し、読み取りポインターがNULLでない場合は、SLISTにプッシュします。

したがって、リーダー側が決して読み取らない場合でも、ライター側がバッファーを使い果たすことはありません。リーダーが読み取ると、最新の既知の値が取得されます（1回！）。

この実装が安全ではないのは、複数の同時リーダーまたはライターが存在する場合です。しかし、そもそもそれは目的ではありませんでした。

醜い面では、バッファはいくつかのSLIST_HEADERメンバーが上にある構造体である必要があります。

ここにコードがありますが、火星探査車が金星に着陸したとしても、それは私のせいではないことに注意してください。

const size_t MAX_DATA_SIZE = 512; typedef //__declspec(align(MEMORY_ALLOCATION_ALIGNMENT)) struct DataItem_tag { SLIST_ENTRY listNode; uint8_t data[MAX_DATA_SIZE]; size_t length; } DataItem_t; class CDoubleBuffer { SLIST_HEADER m_writePointers; DataItem_t m_buffers[2]; volatile DataItem_t *m_readPointer; public: CDoubleBuffer() : m_writePointers() , m_buffers() , m_readPointer(NULL) { InitializeSListHead(&m_writePointers); InterlockedPushEntrySList(&m_writePointers, &m_buffers[0].listNode); InterlockedPushEntrySList(&m_writePointers, &m_buffers[1].listNode); } DataItem_t *beginRead() { DataItem_t *result = reinterpret_cast<DataItem_t*>(InterlockedExchangePointer((volatile PVOID*)&m_readPointer, NULL)); return result; } void endRead(DataItem_t *dataItem) { if (NULL != dataItem) { InterlockedPushEntrySList(&m_writePointers, &dataItem->listNode); } } DataItem_t *beginWrite() { DataItem_t *result = reinterpret_cast<DataItem_t*>(InterlockedPopEntrySList(&m_writePointers)); return result; } void endWrite(DataItem_t *dataItem) { DataItem_t *oldReadPointer = reinterpret_cast<DataItem_t*>(InterlockedExchangePointer((volatile PVOID*)&m_readPointer, dataItem)); if (NULL != oldReadPointer) { InterlockedPushEntrySList(&m_writePointers, &oldReadPointer->listNode); } } };

そして、ここにそのテストコードがあります。（上記とテストコードの両方で、<windows.h>と<assert.h>が必要です。）

CDoubleBuffer doubleBuffer; DataItem_t *readValue; DataItem_t *writeValue; // nothing to read yet. Make sure NULL is returned. assert(NULL == doubleBuffer.beginRead()); doubleBuffer.endRead(NULL); // we got nothing, we return nothing. // First write without read writeValue = doubleBuffer.beginWrite(); assert(NULL != writeValue); // if we get NULL here it is a bug. writeValue->length = 0; doubleBuffer.endWrite(writeValue); // Second write without read writeValue = doubleBuffer.beginWrite(); assert(NULL != writeValue); // if we get NULL here it is a bug. writeValue->length = 1; doubleBuffer.endWrite(writeValue); // Third write without read - works because it reuses the old buffer for the new write. writeValue = doubleBuffer.beginWrite(); assert(NULL != writeValue); // if we get NULL here it is a bug. writeValue->length = 2; doubleBuffer.endWrite(writeValue); readValue = doubleBuffer.beginRead(); assert(NULL != readValue); // NULL would obviously be a terrible bug. assert(2 == readValue->length); // We got the latest and greatest? doubleBuffer.endRead(readValue); readValue = doubleBuffer.beginRead(); assert(NULL == readValue); // We expect NULL here. Re-reading is not a feature of this implementation! doubleBuffer.endRead(readValue);