web-dev-qa-db-ja.com

Galera SSTが失敗する

MariaDB 10.2を使用しています

Node2(2.2.2.2)を外部データベースのスレーブとして実行しています。 Galeraクラスター「my_cluster」の最初のメンバーとしてnode2をブートストラップしました

Node1(1.1.1.1)をクラスターに結合しようとすると、エラーが発生します。

Node1で4つのrsyncプロセスが実行されていることがわかります。

mysql    20458  0.0  0.0   4504   788 ?        S    10:49   0:00 sh -c wsrep_sst_rsync --role 'joiner' --address '1.1.1.1:4444' --datadir '/var/mysql/datadir/'   --parent '20440' --binlog '/var/mysql/log/mysql-bin' 
mysql    20459  0.0  0.0   4504  1712 ?        S    10:49   0:00 /bin/sh -ue /usr//bin/wsrep_sst_rsync --role joiner --address 1.1.1.1:4444 --datadir /var/mysql/datadir/ --parent 20440 --binlog /var/mysql/log/mysql-bin
mysql    20500  0.0  0.0  12784  2636 ?        S    10:49   0:00 rsync --daemon --no-detach --port 4444 --config /var/mysql/datadir//rsync_sst.conf
mysql    20755  0.0  0.0  26528  2844 ?        S    10:49   0:00 rsync --daemon --no-detach --port 4444 --config /var/mysql/datadir//rsync_sst.conf
mysql    20779  9.8  0.0  26788  1460 ?        R    10:49   1:00 rsync --daemon --no-detach --port 4444 --config /var/mysql/datadir//rsync_sst.conf

そしてnode2で:

mysql    25860  0.0  0.0   4504   748 ?        S    10:49   0:00 sh -c wsrep_sst_rsync --role 'donor' --address '1.1.1.1:4444/rsync_sst' --socket '/var/run/mysqld/mysqld.sock' --datadir '/var/mysql/datadir/'    --binlog '/var/mysql/log/mysql-bin' --gtid '09e3b6c8-343c-11e8-87cf-07a9813fdf95:0' --gtid-domain-id '0'
mysql    25861  0.0  0.0   4504  1704 ?        S    10:49   0:00 /bin/sh -ue /usr//bin/wsrep_sst_rsync --role donor --address 1.1.1.1:4444/rsync_sst --socket /var/run/mysqld/mysqld.sock --datadir /var/mysql/datadir/ --binlog /var/mysql/log/mysql-bin --gtid 09e3b6c8-343c-11e8-87cf-07a9813fdf95:0 --gtid-domain-id 0
mysql    25909  0.0  0.0   6468  1960 ?        S    10:49   0:00 xargs -I{} -0 -P 8 rsync --owner --group --perms --links --specials --ignore-times --inplace --recursive --delete --quiet --whole-file --exclude */ib_logfile* /var/mysql/datadir//{}/ rsync://1.1.1.1:4444/rsync_sst/{}
mysql    25910 11.8  0.0  22604  3244 ?        S    10:49   1:39 rsync --owner --group --perms --links --specials --ignore-times --inplace --recursive --delete --quiet --whole-file --exclude */ib_logfile* /var/mysql/datadir//./db1/ rsync://1.1.1.1:4444/rsync_sst/./db1

ただし、ポート444はnode1でのみ開いており、datadir内のファイルrsync_sst.confは増加していません。

私は完全に無効になっているufwのapparmorを持っています。ファイアウォールでは次のことが可能です。

  • 4444のすべてのノードからのTCP
  • 4567上のすべてのノードからのTCPおよびUDP
  • 4568上のすべてのノードからのTCP
  • 3306のすべてのノードからのTCP

node2 my.cnf

binlog_format=row
default-storage-engine=InnoDB
innodb_autoinc_lock_mode=2
query_cache_size=0
query_cache_type=0
innodb_flush_log_at_trx_commit=0
wsrep_on=ON
wsrep_slave_threads=1
wsrep_provider=/usr/lib/galera/libgalera_smm.so
wsrep_cluster_name="my_cluster"
# for bootstrapping
wsrep_cluster_address="gcomm://"
#wsrep_cluster_address="gcomm://2.2.2.2,1.1.1.1"
wsrep_sst_receive_address=2.2.2.2:4444
wsrep_provider_options='ist.recv_addr=2.2.2.2:4568;'
wsrep_sst_method=rsync
wsrep_node_address="2.2.2.2"
wsrep_node_name="node2"

node1 my.cnf

binlog_format=row
default-storage-engine=InnoDB
innodb_autoinc_lock_mode=2
query_cache_size=0
query_cache_type=0
innodb_flush_log_at_trx_commit=0
wsrep_on=ON
wsrep_slave_threads=1
wsrep_provider=/usr/lib/galera/libgalera_smm.so
wsrep_cluster_name="my_cluster"
wsrep_cluster_address="gcomm://2.2.2.2"
# also tried below
#wsrep_cluster_address="gcomm://2.2.2.2,1.1.1.1"
wsrep_sst_receive_address=1.1.1.1:4444
wsrep_provider_options='ist.recv_addr=1.1.1.1:4568;'
wsrep_sst_method=rsync
wsrep_sst_donor="node2,"
wsrep_node_address="1.1.1.1"
wsrep_node_name="node1"

node1 error.log

Apr  3 10:49:23 MY_CLUSTER_NODE1 systemd[1]: Starting LSB: Start and stop the mysql database server daemon...
Apr  3 10:49:23 MY_CLUSTER_NODE1 mysql[19997]:  * Starting MariaDB database server mysqld
Apr  3 10:49:23 MY_CLUSTER_NODE1 mysqld_safe: Starting mysqld daemon with databases from /var/mysql/datadir
Apr  3 10:49:23 MY_CLUSTER_NODE1 mysqld_safe: WSREP: Running position recovery with --disable-log-error  --pid-file='/var/mysql/datadir/MY_CLUSTER_NODE1-recover.pid'
Apr  3 10:49:26 MY_CLUSTER_NODE1 mysqld_safe: WSREP: Recovered position 00000000-0000-0000-0000-000000000000:-1
Apr  3 10:49:26 MY_CLUSTER_NODE1 mysqld: 2018-04-03 10:49:26 140084453714112 [Note] /usr/sbin/mysqld (mysqld 10.2.14-MariaDB-10.2.14+maria~xenial-log) starting as process 20440 ...
Apr  3 10:49:26 MY_CLUSTER_NODE1 mysqld: 2018-04-03 10:49:26 140084453714112 [Note] WSREP: Read nil XID from storage engines, skipping position init
Apr  3 10:49:26 MY_CLUSTER_NODE1 mysqld: 2018-04-03 10:49:26 140084453714112 [Note] WSREP: wsrep_load(): loading provider library '/usr/lib/galera/libgalera_smm.so'
Apr  3 10:49:26 MY_CLUSTER_NODE1 mysqld: 2018-04-03 10:49:26 140084453714112 [Note] WSREP: wsrep_load(): Galera 25.3.23(r3789) by Codership Oy <[email protected]> loaded successfully.
Apr  3 10:49:26 MY_CLUSTER_NODE1 mysqld: 2018-04-03 10:49:26 140084453714112 [Note] WSREP: CRC-32C: using hardware acceleration.
Apr  3 10:49:26 MY_CLUSTER_NODE1 mysqld: 2018-04-03 10:49:26 140084453714112 [Note] WSREP: Found saved state: 00000000-0000-0000-0000-000000000000:-1, safe_to_bootstrap: 1
Apr  3 10:49:26 MY_CLUSTER_NODE1 mysqld: 2018-04-03 10:49:26 140084453714112 [Note] WSREP: Passing config to GCS: base_dir = /var/mysql/datadir/; base_Host = 1.1.1.1; base_port = 4567; cert.log_conflicts = no; debug = no; evs.auto_evict = 0; evs.delay_margin = PT1S; evs.delayed_keep_period = PT30S; evs.inactive_check_period = PT0.5S; evs.inactive_timeout = PT15S; evs.join_retrans_period = PT1S; evs.max_install_timeouts = 3; evs.send_window = 4; evs.stats_report_period = PT1M; evs.suspect_timeout = PT5S; evs.user_send_window = 2; evs.view_forget_timeout = PT24H; gcache.dir = /var/mysql/datadir/; gcache.keep_pages_size = 0; gcache.mem_size = 0; gcache.name = /var/mysql/datadir//galera.cache; gcache.page_size = 128M; gcache.recover = no; gcache.size = 128M; gcomm.thread_prio = ; gcs.fc_debug = 0; gcs.fc_factor = 1.0; gcs.fc_limit = 16; gcs.fc_master_slave = no; gcs.max_packet_size = 64500; gcs.max_throttle = 0.25; gcs.recv_q_hard_limit = 9223372036854775807; gcs.recv_q_soft_limit = 0.25; gcs.sync_donor = no; gmcast.seg
Apr  3 10:49:26 MY_CLUSTER_NODE1 mysqld: ment = 0; gmcast.version = 0; ist.recv_addr = 1.1.1.1:4568; pc.a
Apr  3 10:49:26 MY_CLUSTER_NODE1 mysqld: 2018-04-03 10:49:26 140084453714112 [Note] WSREP: GCache history reset: 00000000-0000-0000-0000-000000000000:0 -> 00000000-0000-0000-0000-000000000000:-1
Apr  3 10:49:26 MY_CLUSTER_NODE1 mysqld: 2018-04-03 10:49:26 140084453714112 [Note] WSREP: Assign initial position for certification: -1, protocol version: -1
Apr  3 10:49:26 MY_CLUSTER_NODE1 mysqld: 2018-04-03 10:49:26 140084453714112 [Note] WSREP: wsrep_sst_grab()
Apr  3 10:49:26 MY_CLUSTER_NODE1 mysqld: 2018-04-03 10:49:26 140084453714112 [Note] WSREP: Start replication
Apr  3 10:49:26 MY_CLUSTER_NODE1 mysqld: 2018-04-03 10:49:26 140084453714112 [Note] WSREP: Setting initial position to 00000000-0000-0000-0000-000000000000:-1
Apr  3 10:49:26 MY_CLUSTER_NODE1 mysqld: 2018-04-03 10:49:26 140084453714112 [Note] WSREP: protonet asio version 0
Apr  3 10:49:26 MY_CLUSTER_NODE1 mysqld: 2018-04-03 10:49:26 140084453714112 [Note] WSREP: Using CRC-32C for message checksums.
Apr  3 10:49:26 MY_CLUSTER_NODE1 mysqld: 2018-04-03 10:49:26 140084453714112 [Note] WSREP: backend: asio
Apr  3 10:49:26 MY_CLUSTER_NODE1 mysqld: 2018-04-03 10:49:26 140084453714112 [Note] WSREP: gcomm thread scheduling priority set to other:0
Apr  3 10:49:26 MY_CLUSTER_NODE1 mysqld: 2018-04-03 10:49:26 140084453714112 [Warning] WSREP: access file(/var/mysql/datadir//gvwstate.dat) failed(No such file or directory)
Apr  3 10:49:26 MY_CLUSTER_NODE1 mysqld: 2018-04-03 10:49:26 140084453714112 [Note] WSREP: restore pc from disk failed
Apr  3 10:49:26 MY_CLUSTER_NODE1 mysqld: 2018-04-03 10:49:26 140084453714112 [Note] WSREP: GMCast version 0
Apr  3 10:49:26 MY_CLUSTER_NODE1 mysqld: 2018-04-03 10:49:26 140084453714112 [Note] WSREP: (8854b393, 'tcp://0.0.0.0:4567') listening at tcp://0.0.0.0:4567
Apr  3 10:49:26 MY_CLUSTER_NODE1 mysqld: 2018-04-03 10:49:26 140084453714112 [Note] WSREP: (8854b393, 'tcp://0.0.0.0:4567') multicast: , ttl: 1
Apr  3 10:49:26 MY_CLUSTER_NODE1 mysqld: 2018-04-03 10:49:26 140084453714112 [Note] WSREP: EVS version 0
Apr  3 10:49:26 MY_CLUSTER_NODE1 mysqld: 2018-04-03 10:49:26 140084453714112 [Note] WSREP: gcomm: connecting to group 'my_cluster', peer '2.2.2.2:'
Apr  3 10:49:26 MY_CLUSTER_NODE1 mysqld: 2018-04-03 10:49:26 140084453714112 [Note] WSREP: (8854b393, 'tcp://0.0.0.0:4567') connection established to d1198d28 tcp://2.2.2.2:4567
Apr  3 10:49:26 MY_CLUSTER_NODE1 mysqld: 2018-04-03 10:49:26 140084453714112 [Note] WSREP: (8854b393, 'tcp://0.0.0.0:4567') turning message relay requesting on, nonlive peers:
Apr  3 10:49:27 MY_CLUSTER_NODE1 mysqld: 2018-04-03 10:49:27 140084453714112 [Note] WSREP: declaring d1198d28 at tcp://2.2.2.2:4567 stable
Apr  3 10:49:27 MY_CLUSTER_NODE1 mysqld: 2018-04-03 10:49:27 140084453714112 [Note] WSREP: Node d1198d28 state prim
Apr  3 10:49:27 MY_CLUSTER_NODE1 mysqld: 2018-04-03 10:49:27 140084453714112 [Note] WSREP: view(view_id(PRIM,8854b393,6) memb {
Apr  3 10:49:27 MY_CLUSTER_NODE1 mysqld: #0118854b393,0
Apr  3 10:49:27 MY_CLUSTER_NODE1 mysqld: #011d1198d28,0
Apr  3 10:49:27 MY_CLUSTER_NODE1 mysqld: } joined {
Apr  3 10:49:27 MY_CLUSTER_NODE1 mysqld: } left {
Apr  3 10:49:27 MY_CLUSTER_NODE1 mysqld: } partitioned {
Apr  3 10:49:27 MY_CLUSTER_NODE1 mysqld: })
Apr  3 10:49:27 MY_CLUSTER_NODE1 mysqld: 2018-04-03 10:49:27 140084453714112 [Note] WSREP: save pc into disk
Apr  3 10:49:27 MY_CLUSTER_NODE1 mysqld: 2018-04-03 10:49:27 140084453714112 [Note] WSREP: gcomm: connected
Apr  3 10:49:27 MY_CLUSTER_NODE1 mysqld: 2018-04-03 10:49:27 140084453714112 [Note] WSREP: Changing maximum packet size to 64500, resulting msg size: 32636
Apr  3 10:49:27 MY_CLUSTER_NODE1 mysqld: 2018-04-03 10:49:27 140084453714112 [Note] WSREP: Shifting CLOSED -> OPEN (TO: 0)
Apr  3 10:49:27 MY_CLUSTER_NODE1 mysqld: 2018-04-03 10:49:27 140084453714112 [Note] WSREP: Opened channel 'my_cluster'
Apr  3 10:49:27 MY_CLUSTER_NODE1 mysqld: 2018-04-03 10:49:27 140084015769344 [Note] WSREP: New COMPONENT: primary = yes, bootstrap = no, my_idx = 0, memb_num = 2
Apr  3 10:49:27 MY_CLUSTER_NODE1 mysqld: 2018-04-03 10:49:27 140084453714112 [Note] WSREP: Waiting for SST to complete.
Apr  3 10:49:27 MY_CLUSTER_NODE1 mysqld: 2018-04-03 10:49:27 140084015769344 [Note] WSREP: STATE_EXCHANGE: sent state UUID: 88ed85fa-3713-11e8-b4e1-cf323f71d772
Apr  3 10:49:27 MY_CLUSTER_NODE1 mysqld: 2018-04-03 10:49:27 140084015769344 [Note] WSREP: STATE EXCHANGE: sent state msg: 88ed85fa-3713-11e8-b4e1-cf323f71d772
Apr  3 10:49:27 MY_CLUSTER_NODE1 mysqld: 2018-04-03 10:49:27 140084015769344 [Note] WSREP: STATE EXCHANGE: got state msg: 88ed85fa-3713-11e8-b4e1-cf323f71d772 from 0 (node1)
Apr  3 10:49:27 MY_CLUSTER_NODE1 mysqld: 2018-04-03 10:49:27 140084015769344 [Note] WSREP: STATE EXCHANGE: got state msg: 88ed85fa-3713-11e8-b4e1-cf323f71d772 from 1 (node2)
Apr  3 10:49:27 MY_CLUSTER_NODE1 mysqld: 2018-04-03 10:49:27 140084015769344 [Note] WSREP: Quorum results:
Apr  3 10:49:27 MY_CLUSTER_NODE1 mysqld: #011version    = 4,
Apr  3 10:49:27 MY_CLUSTER_NODE1 mysqld: #011component  = PRIMARY,
Apr  3 10:49:27 MY_CLUSTER_NODE1 mysqld: #011conf_id    = 5,
Apr  3 10:49:27 MY_CLUSTER_NODE1 mysqld: #011members    = 1/2 (joined/total),
Apr  3 10:49:27 MY_CLUSTER_NODE1 mysqld: #011act_id     = 0,
Apr  3 10:49:27 MY_CLUSTER_NODE1 mysqld: #011last_appl. = -1,
Apr  3 10:49:27 MY_CLUSTER_NODE1 mysqld: #011protocols  = 0/8/3 (gcs/repl/appl),
Apr  3 10:49:27 MY_CLUSTER_NODE1 mysqld: #011group UUID = 09e3b6c8-343c-11e8-87cf-07a9813fdf95
Apr  3 10:49:27 MY_CLUSTER_NODE1 mysqld: 2018-04-03 10:49:27 140084015769344 [Note] WSREP: Flow-control interval: [23, 23]
Apr  3 10:49:27 MY_CLUSTER_NODE1 mysqld: 2018-04-03 10:49:27 140084015769344 [Note] WSREP: Trying to continue unpaused monitor
Apr  3 10:49:27 MY_CLUSTER_NODE1 mysqld: 2018-04-03 10:49:27 140084015769344 [Note] WSREP: Shifting OPEN -> PRIMARY (TO: 0)
Apr  3 10:49:27 MY_CLUSTER_NODE1 mysqld: 2018-04-03 10:49:27 140084256552704 [Note] WSREP: State transfer required:
Apr  3 10:49:27 MY_CLUSTER_NODE1 mysqld: #011Group state: 09e3b6c8-343c-11e8-87cf-07a9813fdf95:0
Apr  3 10:49:27 MY_CLUSTER_NODE1 mysqld: #011Local state: 00000000-0000-0000-0000-000000000000:-1
Apr  3 10:49:27 MY_CLUSTER_NODE1 mysqld: 2018-04-03 10:49:27 140084256552704 [Note] WSREP: New cluster view: global state: 09e3b6c8-343c-11e8-87cf-07a9813fdf95:0, view# 6: Primary, number of nodes: 2, my index: 0, protocol version 3
Apr  3 10:49:27 MY_CLUSTER_NODE1 mysqld: 2018-04-03 10:49:27 140084256552704 [Warning] WSREP: Gap in state sequence. Need state transfer.
Apr  3 10:49:27 MY_CLUSTER_NODE1 mysqld: 2018-04-03 10:49:27 140084007376640 [Note] WSREP: Running: 'wsrep_sst_rsync --role 'joiner' --address '1.1.1.1:4444' --datadir '/var/mysql/datadir/'   --parent '20440' --binlog '/var/mysql/log/mysql-bin' '
Apr  3 10:49:27 MY_CLUSTER_NODE1 rsyncd[20500]: rsyncd version 3.1.1 starting, listening on port 4444
Apr  3 10:49:28 MY_CLUSTER_NODE1 mysqld: 2018-04-03 10:49:28 140084256552704 [Note] WSREP: Prepared SST request: rsync|1.1.1.1:4444/rsync_sst
Apr  3 10:49:28 MY_CLUSTER_NODE1 mysqld: 2018-04-03 10:49:28 140084256552704 [Note] WSREP: wsrep_notify_cmd is not defined, skipping notification.
Apr  3 10:49:28 MY_CLUSTER_NODE1 mysqld: 2018-04-03 10:49:28 140084256552704 [Note] WSREP: REPL Protocols: 8 (3, 2)
Apr  3 10:49:28 MY_CLUSTER_NODE1 mysqld: 2018-04-03 10:49:28 140084256552704 [Note] WSREP: Assign initial position for certification: 0, protocol version: 3
Apr  3 10:49:28 MY_CLUSTER_NODE1 mysqld: 2018-04-03 10:49:28 140084265998080 [Note] WSREP: Service thread queue flushed.
Apr  3 10:49:28 MY_CLUSTER_NODE1 mysqld: 2018-04-03 10:49:28 140084256552704 [Warning] WSREP: Failed to prepare for incremental state transfer: Local state UUID (00000000-0000-0000-0000-000000000000) does not match group state UUID (09e3b6c8-343c-11e8-87cf-07a9813fdf95): 1 (Operation not permitted)
Apr  3 10:49:28 MY_CLUSTER_NODE1 mysqld: #011 at galera/src/replicator_str.cpp:prepare_for_IST():482. IST will be unavailable.
Apr  3 10:49:28 MY_CLUSTER_NODE1 mysqld: 2018-04-03 10:49:28 140084015769344 [Note] WSREP: Member 0.0 (node1) requested state transfer from 'node2,'. Selected 1.0 (node2)(SYNCED) as donor.
Apr  3 10:49:28 MY_CLUSTER_NODE1 mysqld: 2018-04-03 10:49:28 140084015769344 [Note] WSREP: Shifting PRIMARY -> JOINER (TO: 0)
Apr  3 10:49:28 MY_CLUSTER_NODE1 mysqld: 2018-04-03 10:49:28 140084256552704 [Note] WSREP: Requesting state transfer: success, donor: 1
Apr  3 10:49:28 MY_CLUSTER_NODE1 mysqld: 2018-04-03 10:49:28 140084256552704 [Note] WSREP: GCache history reset: 00000000-0000-0000-0000-000000000000:0 -> 09e3b6c8-343c-11e8-87cf-07a9813fdf95:0
Apr  3 10:49:28 MY_CLUSTER_NODE1 rsyncd[20530]: connect from ec2-node2.amazonaws.com (2.2.2.2)
Apr  3 10:49:28 MY_CLUSTER_NODE1 rsyncd[20530]: rsync to rsync_sst/ from ec2-node2.amazonaws.com (2.2.2.2)
Apr  3 10:49:29 MY_CLUSTER_NODE1 rsyncd[20530]: receiving file list
Apr  3 10:49:30 MY_CLUSTER_NODE1 mysqld: 2018-04-03 10:49:30 140084024162048 [Note] WSREP: (8854b393, 'tcp://0.0.0.0:4567') turning message relay requesting off
Apr  3 10:49:33 MY_CLUSTER_NODE1 rsyncd[20530]: sent 75 bytes  received 79721974 bytes  total size 79702016
Apr  3 10:49:33 MY_CLUSTER_NODE1 rsyncd[20597]: connect from ec2-node2.amazonaws.com (2.2.2.2)
Apr  3 10:49:33 MY_CLUSTER_NODE1 rsyncd[20597]: rsync to rsync_sst-log_dir/ from ec2-node2.amazonaws.com (2.2.2.2)
Apr  3 10:49:33 MY_CLUSTER_NODE1 rsyncd[20597]: receiving file list
Apr  3 10:49:46 MY_CLUSTER_NODE1 rsyncd[20597]: sent 63 bytes  received 268501210 bytes  total size 268435456
Apr  3 10:49:46 MY_CLUSTER_NODE1 kernel: [85332.768269] TCP: request_sock_TCP: Possible SYN flooding on port 4444. Sending cookies.  Check SNMP counters.
Apr  3 10:49:46 MY_CLUSTER_NODE1 rsyncd[20755]: connect from ec2-node2.amazonaws.com (2.2.2.2)
Apr  3 10:49:46 MY_CLUSTER_NODE1 rsyncd[20756]: connect from ec2-node2.amazonaws.com (2.2.2.2)
Apr  3 10:49:46 MY_CLUSTER_NODE1 rsyncd[20757]: connect from ec2-node2.amazonaws.com (2.2.2.2)
Apr  3 10:49:46 MY_CLUSTER_NODE1 rsyncd[20758]: connect from ec2-node2.amazonaws.com (2.2.2.2)
Apr  3 10:49:46 MY_CLUSTER_NODE1 rsyncd[20759]: connect from ec2-node2.amazonaws.com (2.2.2.2)
Apr  3 10:49:46 MY_CLUSTER_NODE1 rsyncd[20760]: connect from ec2-node2.amazonaws.com (2.2.2.2)
Apr  3 10:49:46 MY_CLUSTER_NODE1 rsyncd[20757]: rsync to rsync_sst/./db2 from ec2-node2.amazonaws.com (2.2.2.2)
Apr  3 10:49:46 MY_CLUSTER_NODE1 rsyncd[20756]: rsync to rsync_sst/./db3 from ec2-node2.amazonaws.com (2.2.2.2)
Apr  3 10:49:46 MY_CLUSTER_NODE1 rsyncd[20755]: rsync to rsync_sst/./db1 from ec2-node2.amazonaws.com (2.2.2.2)
Apr  3 10:49:46 MY_CLUSTER_NODE1 rsyncd[20758]: rsync to rsync_sst/./mysql from ec2-node2.amazonaws.com (2.2.2.2)
Apr  3 10:49:46 MY_CLUSTER_NODE1 rsyncd[20760]: rsync to rsync_sst/./db4 from ec2-node2.amazonaws.com (2.2.2.2)
Apr  3 10:49:46 MY_CLUSTER_NODE1 rsyncd[20759]: rsync to rsync_sst/./performance_schema from ec2-node2.amazonaws.com (2.2.2.2)
Apr  3 10:49:46 MY_CLUSTER_NODE1 rsyncd[20757]: receiving file list
Apr  3 10:49:46 MY_CLUSTER_NODE1 rsyncd[20756]: receiving file list
Apr  3 10:49:46 MY_CLUSTER_NODE1 rsyncd[20755]: receiving file list
Apr  3 10:49:46 MY_CLUSTER_NODE1 rsyncd[20759]: receiving file list
Apr  3 10:49:46 MY_CLUSTER_NODE1 rsyncd[20760]: receiving file list
Apr  3 10:49:46 MY_CLUSTER_NODE1 rsyncd[20758]: receiving file list
Apr  3 10:49:47 MY_CLUSTER_NODE1 rsyncd[20759]: sent 48 bytes  received 214 bytes  total size 61
Apr  3 10:49:47 MY_CLUSTER_NODE1 rsyncd[20760]: sent 314 bytes  received 805861 bytes  total size 804574
Apr  3 10:49:47 MY_CLUSTER_NODE1 rsyncd[20758]: sent 1682 bytes  received 6771484 bytes  total size 6763799
Apr  3 10:49:48 MY_CLUSTER_NODE1 rsyncd[20757]: sent 1074 bytes  received 20618935 bytes  total size 20610082
Apr  3 10:49:55 MY_CLUSTER_NODE1 /etc/init.d/mysql[20905]: 0 processes alive and '/usr/bin/mysqladmin --defaults-file=/etc/mysql/debian.cnf ping' resulted in
Apr  3 10:49:55 MY_CLUSTER_NODE1 /etc/init.d/mysql[20905]: #007/usr/bin/mysqladmin: connect to server at 'localhost' failed
Apr  3 10:49:55 MY_CLUSTER_NODE1 /etc/init.d/mysql[20905]: error: 'Can't connect to local MySQL server through socket '/var/run/mysqld/mysqld.sock' (111)'
Apr  3 10:49:55 MY_CLUSTER_NODE1 mysql[19997]:    ...fail!
Apr  3 10:49:55 MY_CLUSTER_NODE1 /etc/init.d/mysql[20905]: Check that mysqld is running and that the socket: '/var/run/mysqld/mysqld.sock' exists!
Apr  3 10:49:55 MY_CLUSTER_NODE1 systemd[1]: mysql.service: Control process exited, code=exited status=1
Apr  3 10:49:55 MY_CLUSTER_NODE1 /etc/init.d/mysql[20905]: 
Apr  3 10:49:55 MY_CLUSTER_NODE1 systemd[1]: Failed to start LSB: Start and stop the mysql database server daemon.
Apr  3 10:49:55 MY_CLUSTER_NODE1 systemd[1]: mysql.service: Unit entered failed state.
Apr  3 10:49:55 MY_CLUSTER_NODE1 systemd[1]: mysql.service: Failed with result 'exit-code'.
3
rwms
wsrep_sst_donor=2.2.2.2

wsrep_sst_donor には、IPアドレスではなくノード名が必要です。

同様のケースが、MariaDB Jiraで MDEV-13687 として報告されました。

3
dbdemon