最近レプリケーションを壊し、1つの不正なトランザクションを回避しようとしたとき。次のものを手に入れました。
MariaDB [(none)]> STOP SLAVE;
Query OK, 0 rows affected (0.05 sec)
MariaDB [(none)]> SET GLOBAL SQL_SLAVE_SKIP_COUNTER = 1;
ERROR 1966 (HY000): When using parallel replication and GTID with multiple replication domains, @@sql_slave_skip_counter cannot be used. Instead, setting @@gtid_slave_pos explicitly can be used to skip to after a given GTID position.
MariaDB [(none)]> select @@gtid_slave_pos;
+---------------------------------------------+
| @@gtid_slave_pos |
+---------------------------------------------+
| 0-1051-1391406,1-1050-1182069,57-1051-98897 |
+---------------------------------------------+
1 row in set (0.00 sec)
MariaDB [(none)]> show variables like '%_pos%';
+----------------------+---------------------------------------------------------+
| Variable_name | Value |
+----------------------+---------------------------------------------------------+
| gtid_binlog_pos | 0-1051-1391406,2-1051-4474,57-1051-98897 |
| gtid_current_pos | 0-1051-1391406,1-1050-1182069,2-1051-4474,57-1051-98897 |
| gtid_slave_pos | 0-1051-1391406,1-1050-1182069,57-1051-98897 |
| wsrep_start_position | 00000000-0000-0000-0000-000000000000:-1 |
+----------------------+---------------------------------------------------------+
これを修正するにはどうすればよいですか。
更新1
MariaDB [(none)]> show variables like '%gtid%';
+------------------------+------------------------------------------+
| Variable_name | Value |
+------------------------+------------------------------------------+
| gtid_binlog_pos | 1-1050-4820789,2-1051-379101,3-1010-3273 |
| gtid_binlog_state | 1-1050-4820789,2-1051-379101,3-1010-3273 |
| gtid_current_pos | 1-1050-4819948,2-1051-379101,3-1010-3273 |
| gtid_domain_id | 3 |
| gtid_ignore_duplicates | OFF |
| gtid_seq_no | 0 |
| gtid_slave_pos | 1-1050-4819948,2-1051-379101,3-1010-3273 |
| gtid_strict_mode | OFF |
| last_gtid | |
| wsrep_gtid_domain_id | 0 |
| wsrep_gtid_mode | OFF |
+------------------------+------------------------------------------+
@@ gtid_slave_posを設定するための指示に従って、以下を試しました。
MariaDB [(none)]> show slave status\G
*************************** 1. row ***************************
Slave_IO_State: Waiting for master to send event
Master_Host: [redacted]
Master_User: [redacted]
Master_Port: 3306
Connect_Retry: 5
Master_Log_File: binary.000591
Read_Master_Log_Pos: 526511543
Relay_Log_File: tmsdb-relay-bin.001239
Relay_Log_Pos: 4
Relay_Master_Log_File: binary.000591
Slave_IO_Running: Yes
Slave_SQL_Running: No
Replicate_Do_DB:
Replicate_Ignore_DB:
Replicate_Do_Table:
Replicate_Ignore_Table:
Replicate_Wild_Do_Table:
Replicate_Wild_Ignore_Table:
Last_Errno: 1062
Last_Error: Could not execute Write_rows_v1 event on table [redacted] Duplicate entry '1134890' for key 'PRIMARY', Error_code: 1062; handler error HA_ERR_FOUND_DUPP_KEY; the event's master log binary.000591, end_log_pos 60726493
Skip_Counter: 0
Exec_Master_Log_Pos: 60724897
Relay_Log_Space: 465787660
Until_Condition: None
Until_Log_File:
Until_Log_Pos: 0
Master_SSL_Allowed: No
Master_SSL_CA_File:
Master_SSL_CA_Path:
Master_SSL_Cert:
Master_SSL_Cipher:
Master_SSL_Key:
Seconds_Behind_Master: NULL
Master_SSL_Verify_Server_Cert: No
Last_IO_Errno: 0
Last_IO_Error:
Last_SQL_Errno: 1062
Last_SQL_Error: Could not execute Write_rows_v1 event on table [redacted] Duplicate entry '1134890' for key 'PRIMARY', Error_code: 1062; handler error HA_ERR_FOUND_DUPP_KEY; the event's master log binary.000591, end_log_pos 60726493
Replicate_Ignore_Server_Ids:
Master_Server_Id: 1050
Master_SSL_Crl:
Master_SSL_Crlpath:
Using_Gtid: Current_Pos
Gtid_IO_Pos: 1-1050-4827753,2-1051-379101,3-1010-3273
Replicate_Do_Domain_Ids:
Replicate_Ignore_Domain_Ids:
Parallel_Mode: optimistic
1 row in set (0.00 sec)
Gtid_slave_pos varialbeの使用
MariaDB [(none)]> select @@gtid_slave_pos\G;
*************************** 1. row ***************************
@@gtid_slave_pos: 1-1050-4819948,2-1051-379101,3-1010-3273
MariaDB [(none)]> stop slave;
Query OK, 0 rows affected (0.21 sec)
MariaDB [(none)]> SET GLOBAL gtid_slave_pos='1-1050-4819948,2-1051-379101,3-1010-3274';
Query OK, 0 rows affected (0.10 sec)
MariaDB [(none)]> start slave;
Query OK, 0 rows affected (0.21 sec)
上記を実行した後にステータスを確認するとGot fatal error 1236 from master when reading data from binary log: 'Error: connecting slave requested to start from GTID 3-1010-3274, which is not in the master's binlog'
MariaDB [(none)]> show slave status\G
*************************** 1. row ***************************
Slave_IO_State:
Master_Host: 10.56.228.64
Master_User: maxscale
Master_Port: 3306
Connect_Retry: 5
Master_Log_File: binary.000591
Read_Master_Log_Pos: 60724897
Relay_Log_File: tmsdb-relay-bin.001239
Relay_Log_Pos: 4
Relay_Master_Log_File: binary.000591
Slave_IO_Running: No
Slave_SQL_Running: Yes
Replicate_Do_DB:
Replicate_Ignore_DB:
Replicate_Do_Table:
Replicate_Ignore_Table:
Replicate_Wild_Do_Table:
Replicate_Wild_Ignore_Table:
Last_Errno: 0
Last_Error:
Skip_Counter: 0
Exec_Master_Log_Pos: 60724897
Relay_Log_Space: 249
Until_Condition: None
Until_Log_File:
Until_Log_Pos: 0
Master_SSL_Allowed: No
Master_SSL_CA_File:
Master_SSL_CA_Path:
Master_SSL_Cert:
Master_SSL_Cipher:
Master_SSL_Key:
Seconds_Behind_Master: NULL
Master_SSL_Verify_Server_Cert: No
Last_IO_Errno: 1236
Last_IO_Error: Got fatal error 1236 from master when reading data from binary log: 'Error: connecting slave requested to start from GTID 3-1010-3274, which is not in the master's binlog'
Last_SQL_Errno: 0
Last_SQL_Error:
Replicate_Ignore_Server_Ids:
Master_Server_Id: 1050
Master_SSL_Crl:
Master_SSL_Crlpath:
Using_Gtid: Current_Pos
Gtid_IO_Pos: 1-1050-4819948,2-1051-379101,3-1010-3274
Replicate_Do_Domain_Ids:
Replicate_Ignore_Domain_Ids:
Parallel_Mode: optimistic
1 row in set (0.00 sec)
これを前の状態に戻すことができます
MariaDB [(none)]> stop slave;
Query OK, 0 rows affected (0.01 sec)
MariaDB [(none)]> SET GLOBAL gtid_slave_pos='1-1050-4819948,2-1051-379101,3-1010-3273';
Query OK, 0 rows affected (0.09 sec)
MariaDB [(none)]> start slave;
Query OK, 0 rows affected (0.06 sec)
Parallel_Modeが問題の原因である可能性が最も高いことが本番環境で判明しました。
別の値を使用 from optimistic
をお勧めします
MariaDB [(none)]> select @@slave_parallel_mode\G
*************************** 1. row ***************************
@@slave_parallel_mode: optimistic
次のエラーが発生した場合。
pt-slave-restart
2018-02-09T10:39:19 tmsdb-relay-bin.000388 4 1032
DBD::mysql::st execute failed: When using parallel replication and GTID with multiple replication domains, @@sql_slave_skip_counter can not be used. Instead, setting @@gtid_slave_pos explicitly can be used to skip to after a given GTID position. [for Statement "SET GLOBAL SQL_SLAVE_SKIP_COUNTER = 1"] at /bin/pt-slave-restart line 5122.
ログには以下が表示されます。
tail /var/log/mariadb.log
2018-02-09 10:35:46 139919003784960 [ERROR] Slave SQL: Could not execute Update_rows_v1 event on table [tablename]; Can't find record in '[tablename]', Error_code: 1032; handler error HA_ERR_KEY_NOT_FOUND; the event's master log binary.000953, end_log_pos 264325215, Gtid 1-1050-13462991, Internal MariaDB error code: 1032
2018-02-09 10:35:46 139919003784960 [Warning] Slave: Can't find record in '[tablename]' Error_code: 1032
2018-02-09 10:35:46 139919003784960 [ERROR] Error running query, slave SQL thread aborted. Fix the problem, and restart the slave SQL thread with "SLAVE START". We stopped at log 'binary.000953' position 262879171; GTID position '1-1050-13462990,2-1051-379101,3-1010-3273'
2018-02-09 10:35:46 139918776985344 [Note] Slave SQL thread exiting, replication stopped in log 'binary.000953' at position 262879171; GTID position '1-1050-13462990,2-1051-379101,3-1010-3273'
失敗した後にスレーブを再起動するには、次のようにします。
すべて停止slave_parallel_threads
および無効化slave_parallel_mode
MariaDB [(none)]> stop slave;
Query OK, 0 rows affected (0.35 sec)
MariaDB [(none)]> set global slave_parallel_threads = 0;
Query OK, 0 rows affected (0.00 sec)
MariaDB [(none)]> set global slave_parallel_mode = none;
Query OK, 0 rows affected (0.00 sec)
MariaDB [(none)]> Start SLAVE;
Query OK, 0 rows affected (0.00 sec)
今はpt-slave-restart
スレーブを開始したいだけの場合に、シーケンス番号や、他のバンドル全体を考える必要がないため、スレーブを再起動します。
pt-slave-restart
エラーなしで実行されます、ctrl-c
スレーブが追いついたことに満足したら、閉じます。
これはそれほど大きな違いはありませんが、魔法のように自動的に行われます。
STOP SLAVE;
SET GLOBAL sql_slave_skip_counter = 1;
START SLAVE;
並列スレッドが必要な場合は、スレーブが問題を引き起こしているイベントに追いついた、またはイベントを通過した後で、それらを再度有効にすることができます。私 別のものを試してみますslave_parallel_mod
保守的
MariaDB [(none)]> stop slave;
Query OK, 0 rows affected (0.01 sec)
MariaDB [(none)]> set global slave_parallel_threads = 4;
Query OK, 0 rows affected (0.00 sec)
MariaDB [(none)]> set global slave_parallel_mode = conservative;
Query OK, 0 rows affected (0.00 sec)
MariaDB [(none)]> start slave;
Query OK, 0 rows affected (0.09 sec)
私は以下がうまくいったことを発見しました。これは、スレーブをマスターの正確なレプリカである状態に復元しません。データに違いがあります。それらを修正するためにpt-table-syncを使用します。
1。 GTIDメソッドなしでレプリケーションを再開する
2。並列スレーブスレッドを停止する
3。 GTIDレプリケーションを有効にする
4。 percona-toolkit pt-slave-restartを使用してすべてのエラーをスキップする
1。 GTIDメソッドなしでレプリケーションを再起動しますマスターbinglog位置を使用します
_CHANGE MASTER TO MASTER_Host='12.34.56.789',MASTER_USER='slave_user', MASTER_PASSWORD='password', MASTER_LOG_FILE='mysql-bin.000001', MASTER_LOG_POS= 107;
_
これはよく 文書化されています 、グーグルして手順を見つけてください。
2。並列スレーブスレッドを停止する
これは、元の質問で見られた問題の一部でした。
ERROR 1966 (HY000): When using parallel replication and GTID with multiple replication domains, @@sql_slave_skip_counter cannot be used. Instead, setting @@gtid_slave_pos explicitly can be used to skip to after a given GTID position.
イベントをスキップできるようにしたいのですが、誰にとってもGTIDの位置を把握したり増やしたりすることを心配する必要はありません。
_MariaDB [(none)]> stop slave;
Query OK, 0 rows affected (0.35 sec)
MariaDB [(none)]> set global slave_parallel_threads = 0;
Query OK, 0 rows affected (0.00 sec)
MariaDB [(none)]> set global slave_parallel_mode = none;
Query OK, 0 rows affected (0.00 sec)
MariaDB [(none)]> Start SLAVE;
Query OK, 0 rows affected (0.00 sec)
_
今、私が見る並列スレーブスレッドをチェックすると
_MariaDB [(none)]> show slave status \G
*************************** 1. row ***************************
..........
Parallel_Mode: none
_
完了したら、このプロセスを逆にして、並列スレーブスレッドを再度有効にすることができます。そして、私はGTIDが機能していることを知っています。
3。 GTIDレプリケーションを有効にする
GTIDを有効にして スレーブを再起動してみます。
マスターに
_MariaDB [(none)]> SHOW MASTER STATUS\G
*************************** 1. row ***************************
File: mariadb-bin.000001
Position: 510
Binlog_Do_DB:
Binlog_Ignore_DB:
1 row in set (0.00 sec)
SELECT BINLOG_GTID_POS('mariadb-bin.000001', 510);
+--------------------------------------------+
| BINLOG_GTID_POS('mariadb-bin.000001', 510) |
+--------------------------------------------+
| 1-101-1 |
+--------------------------------------------+
1 row in set (0.00 sec)
_
奴隷に
_STOP SLAVE;
SET GLOBAL gtid_slave_pos = '1-101-1';
CHANGE MASTER TO master_use_gtid=slave_pos;
START SLAVE;
_
スレーブを確認すると、マスターと同じ状態に戻るためにスキップするイベントがいくつかあります。
_Last_Error: An attempt was made to binlog GTID 1-1050-5004291 which would create an out-of-order sequence number with existing GTID 1-1050-5004322, and gtid strict mode is enabled.
_
_MariaDB [(none)]> show slave status \G
*************************** 1. row ***************************
Slave_IO_State: Waiting for master to send event
Master_Log_File: binary.000599
Read_Master_Log_Pos: 364810491
Relay_Log_File: tmsdb-relay-bin.001240
Relay_Log_Pos: 716
Relay_Master_Log_File: binary.000599
Slave_IO_Running: Yes
Slave_SQL_Running: No
Replicate_Do_DB:
Replicate_Ignore_DB:
Replicate_Do_Table:
Replicate_Ignore_Table:
Replicate_Wild_Do_Table:
Replicate_Wild_Ignore_Table:
Last_Errno: 1950
Last_Error: An attempt was made to binlog GTID 1-1050-5004291 which would create an out-of-order sequence number with existing GTID 1-1050-5004322, and gtid strict mode is enabled.
Skip_Counter: 0
Exec_Master_Log_Pos: 286447058
Relay_Log_Space: 78364447
Until_Condition: None
Until_Log_File:
Until_Log_Pos: 0
Master_SSL_Allowed: No
Master_SSL_CA_File:
Master_SSL_CA_Path:
Master_SSL_Cert:
Master_SSL_Cipher:
Master_SSL_Key:
Seconds_Behind_Master: NULL
Master_SSL_Verify_Server_Cert: No
Last_IO_Errno: 0
Last_IO_Error:
Last_SQL_Errno: 1950
Last_SQL_Error: An attempt was made to binlog GTID 1-1050-5004291 which would create an out-of-order sequence number with existing GTID 1-1050-5004322, and gtid strict mode is enabled.
Replicate_Ignore_Server_Ids:
Master_Server_Id: 1050
Master_SSL_Crl:
Master_SSL_Crlpath:
Using_Gtid: Slave_Pos
Gtid_IO_Pos: 1-1050-5005223,2-1051-379101,3-1010-3273
Replicate_Do_Domain_Ids:
Replicate_Ignore_Domain_Ids:
Parallel_Mode: none
1 row in set (0.00 sec)
_
4。 percona-toolkit pt-slave-restartを使用してすべてのエラーをスキップする
_Sudo yum install http://www.percona.com/downloads/percona-release/redhat/0.1-4/percona-release-0.1-4.noarch.rpm
Sudo yum search percona-toolkit
_
pt-slave-restartは、スレーブを動作状態にするために必要なすべてのイベントをスキップします。
_# pt-slave-restart
2017-12-22T13:39:59 tmsdb-relay-bin.001240 716 1950
2017-12-22T13:40:00 tmsdb-relay-bin.001240 69702 1950
2017-12-22T13:40:00 tmsdb-relay-bin.001240 97912 1950
2017-12-22T13:40:00 tmsdb-relay-bin.001240 98144 1950
2017-12-22T13:40:00 tmsdb-relay-bin.001240 363903 1950
2017-12-22T13:40:00 tmsdb-relay-bin.001240 364135 1950
2017-12-22T13:40:00 tmsdb-relay-bin.001240 712776 1950
2017-12-22T13:40:00 tmsdb-relay-bin.001240 713008 1950
2017-12-22T13:40:00 tmsdb-relay-bin.001240 759737 1950
2017-12-22T13:40:00 tmsdb-relay-bin.001240 827932 1950
2017-12-22T13:40:00 tmsdb-relay-bin.001240 828164 1950
2017-12-22T13:40:00 tmsdb-relay-bin.001240 934851 1950
2017-12-22T13:40:00 tmsdb-relay-bin.001240 952088 1950
2017-12-22T13:40:00 tmsdb-relay-bin.001240 952320 1950
2017-12-22T13:40:00 tmsdb-relay-bin.001240 1084249 1950
2017-12-22T13:40:00 tmsdb-relay-bin.001240 1084481 1950
2017-12-22T13:40:00 tmsdb-relay-bin.001240 1351188 1950
2017-12-22T13:40:00 tmsdb-relay-bin.001240 1351420 1950
2017-12-22T13:40:00 tmsdb-relay-bin.001240 1621561 1950
2017-12-22T13:40:00 tmsdb-relay-bin.001240 1693920 1950
2017-12-22T13:40:00 tmsdb-relay-bin.001240 1711677 1950
2017-12-22T13:40:00 tmsdb-relay-bin.001240 1711909 1950
2017-12-22T13:40:00 tmsdb-relay-bin.001240 1880931 1950
2017-12-22T13:40:00 tmsdb-relay-bin.001240 1881163 1950
2017-12-22T13:40:00 tmsdb-relay-bin.001240 1916544 1950
2017-12-22T13:40:00 tmsdb-relay-bin.001240 2124672 1950
2017-12-22T13:40:01 tmsdb-relay-bin.001240 2124904 1950
2017-12-22T13:40:01 tmsdb-relay-bin.001240 2125136 1950
2017-12-22T13:40:01 tmsdb-relay-bin.001240 2452030 1950
2017-12-22T13:40:01 tmsdb-relay-bin.001240 2452262 1950
2017-12-22T13:40:01 tmsdb-relay-bin.001240 2819749 1950
2017-12-22T13:40:01 tmsdb-relay-bin.001240 2819981 1950
_
スレーブのステータスを確認すると
_MariaDB [(none)]> show slave status \G
*************************** 1. row ***************************
Slave_IO_State: Waiting for master to send event
Master_Host: masterhost
Master_User: maxscale
Master_Port: 3306
Connect_Retry: 5
Master_Log_File: binary.000600
Read_Master_Log_Pos: 37801368
Relay_Log_File: tmsdb-relay-bin.001242
Relay_Log_Pos: 37801653
Relay_Master_Log_File: binary.000600
Slave_IO_Running: Yes
Slave_SQL_Running: Yes
Last_Errno: 0
Last_Error:
Skip_Counter: 0
Exec_Master_Log_Pos: 37801368
Relay_Log_Space: 37801991
Until_Condition: None
Seconds_Behind_Master: 0
Master_SSL_Verify_Server_Cert: No
Last_IO_Errno: 0
Last_IO_Error:
Last_SQL_Errno: 0
Last_SQL_Error:
Master_Server_Id: 1050
Using_Gtid: Slave_Pos
Gtid_IO_Pos: 1-1050-5014401,2-1051-379101,3-1010-3273
Parallel_Mode: none
1 row in set (0.00 sec)
_
最後に、サーバーを再起動して、サーバーが再起動しても安全であることを確認する必要があります。