MongoDBレプリケーション：10333の他のメンテナンスモードタスクが進行中のメンテナンスモードに入る

Question

再同期が必要なMongoDBインスタンスがあります。

2016-11-07T11:59:23.330+0000 I REPL [ReplicationExecutor] syncing from: x.x.x.x:27017 2016-11-07T11:59:23.354+0000 W REPL [rsBackgroundSync] we are too stale to use x.x.x.x:27017 as a sync source 2016-11-07T11:59:23.354+0000 I REPL [ReplicationExecutor] could not find member to sync from 2016-11-07T11:59:23.354+0000 E REPL [rsBackgroundSync] too stale to catch up -- entering maintenance mode 2016-11-07T11:59:23.354+0000 I REPL [rsBackgroundSync] our last optime : (term: 20, timestamp: Oct 4 07:41:29:1) 2016-11-07T11:59:23.354+0000 I REPL [rsBackgroundSync] oldest available is (term: 20, timestamp: Oct 17 02:13:33:5) 2016-11-07T11:59:23.354+0000 I REPL [rsBackgroundSync] See http://dochub.mongodb.org/core/resyncingaverystalereplicasetmember 2016-11-07T11:59:23.355+0000 I REPL [ReplicationExecutor] going into maintenance mode with 10333 other maintenance mode tasks in progress

この線はどういう意味ですか？

[ReplicationExecutor] going into maintenance mode with 10333 other maintenance mode tasks in progress

メンテナンスモードタスクとはMongoDBからのドキュメントはありません。なぜ10333がキューに入っているのですか？それらを見る方法（リスト）？検索エンジンを使用して、with 0 other maintenance mode tasks in progress

Stennie · Accepted Answer

メンテナンスモードタスクとは何ですか？

「メンテナンスモードタスク」メッセージは replSetMaintenanceコマンドへの連続した呼び出しのカウンターを参照しており、（MongoDB 3.4と同様に）特定のキュータスクに関連付けられていません。 replSetMaintenanceコマンドは、一部のメンテナンス作業が行われている間、セカンダリをRECOVERING状態に保つために使用されます。 RECOVERINGメンバーはオンラインのままで、同期する可能性がありますが、通常の読み取り操作から除外されます（例：ドライバーでのセカンダリ読み取り設定の使用）。 replSetMaintenanceを呼び出すたびに、タスクカウンターが増加する（trueの場合）または減少する（falseの場合）。カウンターが0に達すると、メンバーは正常であると想定して、RECOVERINGからSECONDARY状態に戻ります。

MongoDB 3.4と同様に、メンテナンスモードの変更は現在、MongoDBログにのみ記録されます。このコマンドは通常、mongodによって内部的にのみ使用されますが、手動で呼び出すこともできます。

以下は、注釈付きのログ行のセットと、関連するmongoシェルコマンドで、タスクカウンターの変化を示しています。

// db.adminCommand({replSetMaintenance: 1}) [ReplicationExecutor] going into maintenance mode with 0 other maintenance mode tasks in progress [ReplicationExecutor] transition to RECOVERING // db.adminCommand({replSetMaintenance: 1}) [ReplicationExecutor] going into maintenance mode with 1 other maintenance mode tasks in progress // db.adminCommand({replSetMaintenance: 0}) [ReplicationExecutor] leaving maintenance mode (1 other maintenance mode tasks ongoing) // db.adminCommand({replSetMaintenance: 0}) [ReplicationExecutor] leaving maintenance mode (0 other maintenance mode tasks ongoing) [ReplicationExecutor] transition to SECONDARY // db.adminCommand({replSetMaintenance: 0}) [ReplicationExecutor] Attempted to leave maintenance mode but it is not currently active

なぜ10333がキューに入っているのですか？

MongoDB 3.2では、「古すぎる」（つまり、レプリカセットの別の正常なメンバーと共通のoplogエントリがない）レプリカセットメンバーはRECOVERINGモードのままであり、新しい有効な同期ソースが利用可能かどうかを定期的にチェックします。現在、各チェックは「メンテナンスタスク」カウンターをインクリメントするため、メンバーが古くなった場合、これは実際には意味のあるタスク数を示していません。

理論的には、「古すぎ」は最終状態ではありません。大きなoplogを持つメンバーが一時的にオフラインになる可能性があるためです。実際には、「古すぎてエラーに追い付かない」ということは、一般的に manual resync が必要であることを意味します。

2016-11-07T11:59:23.354+0000 I REPL [rsBackgroundSync] our last optime : (term: 20, timestamp: Oct 4 07:41:29:1) 2016-11-07T11:59:23.354+0000 I REPL [rsBackgroundSync] oldest available is (term: 20, timestamp: Oct 17 02:13:33:5)

この場合、問題のレプリカセットメンバーはほぼ2週間前に古くなっているため、メンテナンスモードカウンターは徐々に増加し続けています。 MongoDB Jiraには、監視/賛成投票できる関連する問題があります。 SERVER 23899：メンテナンスモードをリセットして、古すぎる状態から有効な同期ソースに移行する場合。