私は このガイド に従って、Postgres HAのラボをセットアップします。
私は正確にガイドに従います(私の場合はIPアドレスを変更します)、結局すべてがPostgresサーバーで機能します1
しかし、Postgresサーバー2になるとpatroni.yml
セットアップ
ガイドは両方のPostgresサーバーで同じpatroni.yml
セットアップですが、再起動するとpatroni service
この問題はserver1で発生しました
quanlm@DB1:~$ Sudo service patroni status
● patroni.service - Runners to orchestrate a high-availability PostgreSQL
Loaded: loaded (/etc/systemd/system/patroni.service; disabled; vendor preset: enabled)
Active: active (running) since Tue 2019-11-12 07:35:33 UTC; 14min ago
Main PID: 411 (patroni)
Tasks: 12
Memory: 77.6M
CPU: 4.041s
CGroup: /system.slice/patroni.service
├─411 /usr/bin/python3 /usr/local/bin/patroni /etc/patroni.yml
├─431 postgres -D /data/patroni --config-file=/data/patroni/postgresql.conf --listen_addresses=192.168.122.77 --max_prepared_tran
├─435 postgres: postgres: checkpointer process
├─436 postgres: postgres: writer process
├─439 postgres: postgres: stats collector process
├─447 postgres: postgres: postgres postgres 192.168.122.77(49984) idle
├─455 postgres: postgres: wal writer process
└─456 postgres: postgres: autovacuum launcher process
Nov 12 07:49:28 DB1 patroni[411]: 2019-11-12 07:49:28,533 INFO: no action. i am the leader with the lock
Nov 12 07:49:38 DB1 patroni[411]: 2019-11-12 07:49:38,459 INFO: Lock owner: postgresql0; I am postgresql0
Nov 12 07:49:38 DB1 patroni[411]: 2019-11-12 07:49:38,536 INFO: no action. i am the leader with the lock
Nov 12 07:49:48 DB1 patroni[411]: 2019-11-12 07:49:48,459 INFO: Lock owner: postgresql0; I am postgresql0
Nov 12 07:49:48 DB1 patroni[411]: 2019-11-12 07:49:48,544 INFO: no action. i am the leader with the lock
Nov 12 07:49:58 DB1 patroni[411]: 2019-11-12 07:49:58,458 INFO: Lock owner: postgresql0; I am postgresql0
Nov 12 07:49:58 DB1 patroni[411]: 2019-11-12 07:49:58,548 INFO: no action. i am the leader with the lock
Nov 12 07:50:08 DB1 patroni[411]: 2019-11-12 07:50:08,457 INFO: Lock owner: postgresql0; I am postgresql0
Nov 12 07:50:08 DB1 patroni[411]: 2019-11-12 07:50:08,539 INFO: no action. i am the leader with the lock
Nov 12 07:50:19 DB1 patroni[411]: 2019-11-12 07:50:19,949 INFO: acquired session lock as a leader
サーバー1は問題ありませんが、サーバー2にあります
quanlm@DB2:~$ Sudo service patroni status
● patroni.service - Runners to orchestrate a high-availability PostgreSQL
Loaded: loaded (/etc/systemd/system/patroni.service; disabled; vendor preset: enabled)
Active: failed (Result: exit-code) since Tue 2019-11-12 07:50:02 UTC; 2s ago
Process: 9514 ExecStart=/usr/local/bin/patroni /etc/patroni.yml (code=exited, status=1/FAILURE)
Main PID: 9514 (code=exited, status=1/FAILURE)
Nov 12 07:50:02 DB2 patroni[9514]: File "/usr/lib/python3.5/socketserver.py", line 440, in __init__
Nov 12 07:50:02 DB2 patroni[9514]: self.server_bind()
Nov 12 07:50:02 DB2 patroni[9514]: File "/usr/lib/python3.5/http/server.py", line 138, in server_bind
Nov 12 07:50:02 DB2 patroni[9514]: socketserver.TCPServer.server_bind(self)
Nov 12 07:50:02 DB2 patroni[9514]: File "/usr/lib/python3.5/socketserver.py", line 454, in server_bind
Nov 12 07:50:02 DB2 patroni[9514]: self.socket.bind(self.server_address)
Nov 12 07:50:02 DB2 patroni[9514]: OSError: [Errno 99] Cannot assign requested address
Nov 12 07:50:02 DB2 systemd[1]: patroni.service: Main process exited, code=exited, status=1/FAILURE
Nov 12 07:50:02 DB2 systemd[1]: patroni.service: Unit entered failed state.
Nov 12 07:50:02 DB2 systemd[1]: patroni.service: Failed with result 'exit-code'.
結局それは機能しません。
編集によって両方のサーバーのリモート接続を許可していますlisten_addresses = '*'
オン postgresql.conf
および
Host all all 0.0.0.0/0 md5
オン pg_hba.conf
したがって、HAproxyが機能し始めたときに、最初のサーバーがダウンしても、2番目のサーバーは稼働しませんでした。
問題は確かにサーバー2のpatroni
にありますが、それを修正する方法は?
それ以外の場合、HA postgresqlサーバーに回避策はありますか?
P/s:ファイアウォール設定
quanlm@DB1:~$ Sudo ufw status
Status: inactive
quanlm@DB1:~$ Sudo iptables -L
Chain INPUT (policy ACCEPT)
target prot opt source destination
Chain FORWARD (policy ACCEPT)
target prot opt source destination
Chain OUTPUT (policy ACCEPT)
target prot opt source destination
quanlm@DB2:~$ Sudo ufw status
Status: inactive
quanlm@DB2:~$ Sudo iptables -L
Chain INPUT (policy ACCEPT)
target prot opt source destination
Chain FORWARD (policy ACCEPT)
target prot opt source destination
Chain OUTPUT (policy ACCEPT)
target prot opt source destination
P/ss:私のpatroni.yml
quanlm@DB1:~$ cat /etc/patroni.yml
scope: postgres
namespace: /db/
name: postgresql0
restapi:
listen: 192.168.122.77:8008
connect_address: 192.168.122.77:8008
etcd:
Host: 192.168.122.156:2379
bootstrap:
dcs:
ttl: 30
loop_wait: 10
retry_timeout: 10
maximum_lag_on_failover: 1048576
postgresql:
use_pg_rewind: true
initdb:
- encoding: UTF8
- data-checksums
pg_hba:
- Host replication replicator 127.0.0.1/32 md5
- Host replication replicator 192.168.122.77/0 md5
- Host replication replicator 192.168.122.240/0 md5
- Host all all 0.0.0.0/0 md5
users:
admin:
password: admin
options:
- createrole
- createdb
postgresql:
listen: 192.168.122.77:5432
connect_address: 192.168.122.77:5432
data_dir: /data/patroni
pgpass: /tmp/pgpass
authentication:
replication:
username: replicator
password: password
superuser:
username: postgres
password: password
parameters:
unix_socket_directories: '.'
tags:
nofailover: false
noloadbalance: false
clonefrom: false
nosync: false
はい、両方のサーバーで
更新#1
オン patroni.yml
変更がありますname: postgresql0
-> name: postgresqp1
残りのAPIはホストIP「192.168.122.240」に設定されています
しかし、1つ
postgresql:
listen: 192.168.122.77:5432
connect_address: 192.168.122.77:5432
この問題が発生しました:
quanlm@DB2:~⟫ Sudo service patroni status
● patroni.service - Runners to orchestrate a high-availability PostgreSQL
Loaded: loaded (/etc/systemd/system/patroni.service; disabled; vendor preset: enabled)
Active: active (running) since Wed 2019-11-13 02:17:16 UTC; 25s ago
Main PID: 32363 (patroni)
Tasks: 6
Memory: 45.7M
CPU: 6.326s
CGroup: /system.slice/patroni.service
├─ 1014 postgres -D /data/patroni --config-file=/data/patroni/postgresql.conf --port=5432 --wal_level=hot_standby --max_wal_senders=10 --cluster
└─32363 /usr/bin/python3 /usr/local/bin/patroni /etc/patroni.yml
Nov 13 02:17:39 DB2 patroni[32363]: 192.168.122.77:5432 - accepting connections
Nov 13 02:17:39 DB2 patroni[32363]: 192.168.122.77:5432 - accepting connections
Nov 13 02:17:39 DB2 patroni[32363]: 2019-11-13 02:17:39,940 INFO: Lock owner: postgresql0; I am postgresql1
Nov 13 02:17:39 DB2 patroni[32363]: 2019-11-13 02:17:39,940 INFO: does not have lock
Nov 13 02:17:39 DB2 patroni[32363]: 2019-11-13 02:17:39,940 INFO: establishing a new patroni connection to the postgres cluster
Nov 13 02:17:40 DB2 patroni[32363]: LOG: could not bind IPv4 socket: Cannot assign requested address
Nov 13 02:17:40 DB2 patroni[32363]: HINT: Is another postmaster already running on port 5432? If not, wait a few seconds and retry.
Nov 13 02:17:40 DB2 patroni[32363]: WARNING: could not create listen socket for "192.168.122.77"
Nov 13 02:17:40 DB2 patroni[32363]: FATAL: could not create any TCP/IP sockets
Nov 13 02:17:40 DB2 patroni[32363]: 2019-11-13 02:17:40,042 INFO: demoting self because i do not have the lock and i was a leader
と私が変更した場合
postgresql:
listen: 192.168.122.240:5432
connect_address: 192.168.122.240:5432
これが起こりました:
quanlm@DB2:~⟫ Sudo service patroni status
● patroni.service - Runners to orchestrate a high-availability PostgreSQL
Loaded: loaded (/etc/systemd/system/patroni.service; disabled; vendor preset: enabled)
Active: active (running) since Wed 2019-11-13 02:18:57 UTC; 40s ago
Main PID: 3785 (patroni)
Tasks: 11
Memory: 59.6M
CPU: 770ms
CGroup: /system.slice/patroni.service
├─3785 /usr/bin/python3 /usr/local/bin/patroni /etc/patroni.yml
├─3818 postgres -D /data/patroni --config-file=/data/patroni/postgresql.conf --max_replication_slots=10 --port=5432 --max_connections=100 --max_
├─3853 postgres: postgres: startup process recovering 000000040000000000000006
├─3857 postgres: postgres: checkpointer process
├─3858 postgres: postgres: writer process
├─3859 postgres: postgres: stats collector process
└─3916 postgres: postgres: postgres postgres 192.168.122.240(39576) idle
Nov 13 02:19:19 DB2 patroni[3785]:
Nov 13 02:19:24 DB2 patroni[3785]: FATAL: could not start WAL streaming: ERROR: replication slot "postgresql1" does not exist
Nov 13 02:19:24 DB2 patroni[3785]:
Nov 13 02:19:27 DB2 patroni[3785]: 2019-11-13 02:19:27,938 INFO: Lock owner: postgresql0; I am postgresql1
Nov 13 02:19:27 DB2 patroni[3785]: 2019-11-13 02:19:27,938 INFO: does not have lock
Nov 13 02:19:27 DB2 patroni[3785]: 2019-11-13 02:19:27,966 INFO: no action. i am a secondary and i am following a leader
Nov 13 02:19:29 DB2 patroni[3785]: FATAL: could not start WAL streaming: ERROR: replication slot "postgresql1" does not exist
Nov 13 02:19:29 DB2 patroni[3785]:
Nov 13 02:19:34 DB2 patroni[3785]: FATAL: could not start WAL streaming: ERROR: replication slot "postgresql1" does not exist
Nov 13 02:19:34 DB2 patroni[3785]:
更新#2
設定後patroni.yml
戻る name: postgresql0
と
postgresql:
listen: 192.168.122.240:5432
connect_address: 192.168.122.240:5432
サービスをリセットした後、両方のDBが稼働します... HAの目的でアクティブ-パッシブサーバーを設定するときは、そのように思われることはないと思います...そして、それらは互いに返信しませんでした
最初のノード(patroni.yml
という名前)ではpostgresql0
で問題ありませんが、2番目のノードでは(たとえば)postgresql1
に名前を変更する必要があります。また、(そのチュートリアルの上部にリストされている)他のノードのIPを選択し、そのノードを使用してYAMLも更新します(restapi
とpostgresql
の下に多数のオカレンスがあります) 。
疑わしい場合は、Patroniリポジトリのサンプルファイル( https://github.com/zalando/patroni/blob/master/postgres0.yml と他の2つ)には、常に機能する値が含まれています。