
Postgresql 11論理レプリケーション-「キャッチアップ」状態でスタック

私は2つのpostgresql 11サーバーを実行しています-マスターとスレーブ(論理レプリケーションでセットアップ)。


2019-09-16 07:39:44.332 CEST [30117] ERROR:  could not send data to WAL stream: server closed the connection unexpectedly
                This probably means the server terminated abnormally
                before or while processing the request.
2019-09-16 07:39:44.539 CEST [12932] LOG:  logical replication apply worker for subscription "logical_from_master" has started
2019-09-16 07:39:44.542 CEST [27972] LOG:  background worker "logical replication worker" (PID 30117) exited with exit code 1

以前にこのエラーメッセージが表示されましたが、私のプロセスはマスターでwal_sender_timeoutを増やすことでした(これの詳細はここにあります postgresqlの論理レプリケーション-"サーバーが予期せず接続を閉じました"


master=# select * from pg_stat_replication;
  pid  | usesysid | usename | application_name  |  client_addr  | client_hostname | client_port |         backend_start         | backend_xmin |  state  |   sent_lsn   |  write_lsn   |  flush_lsn   |  replay_lsn  |    write_lag    |    flush_lag    |   replay_lag    | sync_priority | sync_state
 86864 |    16680 | my_user    | logical_from_master | |                 |       46110 | 2019-09-16 12:45:56.491325+02 |              | catchup | D55/FA04D4B8 | D55/F9E74158 | D55/F9E44CD8 | D55/F9E74030 | 00:00:03.603104 | 00:00:03.603104 | 00:00:03.603104 |             0 | async
(1 row)

サブスクリプションの有効と無効をさまざまに組み合わせて、スレーブを数回再起動しようとしましたが、何も役に立ちません。レプリケーションステータスはcatchupのままです。 sent_lsnwrite_lsnの値が変化しているため、何かが送信されています...







# maximum wait time in milliseconds that the walsender process on the active master
# waits for a status message from the walreceiver process on the standby master.





master=# select * from pg_stat_replication;
  pid  | usesysid | usename | application_name  |  client_addr  | client_hostname | client_port |         backend_start         | backend_xmin |  state  |   sent_lsn   |  write_lsn   |  flush_lsn   |  replay_lsn  | write_lag | flush_lag | replay_lag | sync_priority | sync_state
 12965 |    16680 | my_user    | logical_from_master | |                 |       46630 | 2019-09-17 06:40:18.801262+02 |              | catchup | D56/248E13A0 | D56/247E3908 | D56/247E3908 | D56/247E3908 |           |           |            |             0 | async
(1 row)



2019-09-16 22:43:33.841 CEST [20260] ERROR:  could not receive data from WAL stream: server closed the connection unexpectedly
                This probably means the server terminated abnormally
                before or while processing the request.
2019-09-16 22:43:33.959 CEST [26087] LOG:  background worker "logical replication worker" (PID 20260) exited with exit code 1
2019-09-16 22:43:34.112 CEST [3510] LOG:  logical replication apply worker for subscription "logical_from_master" has started




2019-09-18 19:15:13.767 CEST [8611] LOG:  logical replication table synchronization worker for subscription "logical_replica_from_master", table "lasttable" has finished
2019-09-18 19:54:14.875 CEST [11469] ERROR:  could not send data to WAL stream: server closed the connection unexpectedly
                This probably means the server terminated abnormally
                before or while processing the request.
2019-09-18 19:54:14.969 CEST [10330] LOG:  logical replication apply worker for subscription "logical_replica_from_master" has started
2019-09-18 19:54:15.031 CEST [11217] LOG:  background worker "logical replication worker" (PID 11469) exited with exit code 1


2019-09-18 19:50:36.386 CEST,,,111051,,5d826e6a.1b1cb,1,,2019-09-18 19:50:34 CEST,138/28493452,0,LOG,00000,"automatic vacuum of table ""my_db.pg_toast.pg_toast_22314"": index scans: 0
pages: 0 removed, 8949 remain, 0 skipped due to pins, 0 skipped frozen
tuples: 0 removed, 43798 remain, 43783 are dead but not yet removable, oldest xmin: 3141915780
buffer usage: 17925 hits, 0 misses, 0 dirtied
avg read rate: 0.000 MB/s, avg write rate: 0.000 MB/s
system usage: CPU: user: 0.04 s, system: 0.05 s, elapsed: 1.88 s",,,,,,,,,""
2019-09-18 19:51:36.402 CEST,,,1714,,5d826ea6.6b2,1,,2019-09-18 19:51:34 CEST,316/16529009,0,LOG,00000,"automatic vacuum of table ""my_db.pg_toast.pg_toast_22314"": index scans: 0
pages: 0 removed, 8949 remain, 0 skipped due to pins, 0 skipped frozen
tuples: 0 removed, 43798 remain, 43795 are dead but not yet removable, oldest xmin: 3141915780
buffer usage: 17925 hits, 0 misses, 0 dirtied
avg read rate: 0.000 MB/s, avg write rate: 0.000 MB/s
system usage: CPU: user: 0.01 s, system: 0.07 s, elapsed: 1.87 s",,,,,,,,,""
2019-09-18 19:52:36.421 CEST,,,2649,,5d826ee2.a59,1,,2019-09-18 19:52:34 CEST,153/19807659,0,LOG,00000,"automatic vacuum of table ""my_db.pg_toast.pg_toast_22314"": index scans: 0
pages: 0 removed, 8949 remain, 0 skipped due to pins, 0 skipped frozen
tuples: 0 removed, 43798 remain, 43795 are dead but not yet removable, oldest xmin: 3141915780
buffer usage: 17924 hits, 0 misses, 0 dirtied
avg read rate: 0.000 MB/s, avg write rate: 0.000 MB/s
system usage: CPU: user: 0.03 s, system: 0.05 s, elapsed: 1.87 s",,,,,,,,,""
2019-09-18 19:53:36.424 CEST,,,2945,,5d826f1e.b81,1,,2019-09-18 19:53:34 CEST,317/15405278,0,LOG,00000,"automatic vacuum of table ""my_db.pg_toast.pg_toast_22314"": index scans: 0
pages: 0 removed, 8949 remain, 0 skipped due to pins, 0 skipped frozen
tuples: 0 removed, 43798 remain, 43795 are dead but not yet removable, oldest xmin: 3141915780
buffer usage: 17924 hits, 0 misses, 0 dirtied
avg read rate: 0.000 MB/s, avg write rate: 0.000 MB/s
system usage: CPU: user: 0.03 s, system: 0.05 s, elapsed: 1.88 s",,,,,,,,,""
2019-09-18 19:54:15.123 CEST,"core","my_db",3073,"",5d826f47.c01,1,"idle",2019-09-18 19:54:15 CEST,317/0,0,LOG,00000,"starting logical decoding for slot ""logical_replica_from_master""","Streaming transactions committing after D5B/7A4D40, reading WAL from D5B/7A4D40.",,,,,,,,"logical_replica_from_master"
2019-09-18 19:54:15.124 CEST,"core","my_db",3073,"",5d826f47.c01,2,"idle",2019-09-18 19:54:15 CEST,317/0,0,LOG,00000,"logical decoding found consistent point at D5B/7A4D40","There are no running transactions.",,,,,,,,"logical_replica_from_master"
2019-09-18 19:54:36.442 CEST,,,3152,,5d826f5a.c50,1,,2019-09-18 19:54:34 CEST,362/5175766,0,LOG,00000,"automatic vacuum of table ""my_db.pg_toast.pg_toast_22314"": index scans: 0
pages: 0 removed, 8949 remain, 0 skipped due to pins, 0 skipped frozen
tuples: 0 removed, 43798 remain, 43795 are dead but not yet removable, oldest xmin: 3141915780
buffer usage: 17924 hits, 0 misses, 0 dirtied
avg read rate: 0.000 MB/s, avg write rate: 0.000 MB/s
system usage: CPU: user: 0.02 s, system: 0.06 s, elapsed: 1.88 s",,,,,,,,,""


2019-09-19 00:16:48.167 CEST [10330] ERROR:  could not send data to WAL stream: server closed the connection unexpectedly
                This probably means the server terminated abnormally
                before or while processing the request.
2019-09-19 00:16:48.276 CEST [19530] LOG:  logical replication apply worker for subscription "logical_replica_from_master" has started
2019-09-19 00:16:48.324 CEST [11217] LOG:  background worker "logical replication worker" (PID 10330) exited with exit code 1


2019-09-19 00:15:41.104 CEST,,,74257,,5d82ac89.12211,1,,2019-09-19 00:15:37 CEST,78/34511468,0,LOG,00000,"automatic vacuum of table ""my_db.pg_toast.pg_toast_22314"": index scans: 0
pages: 0 removed, 13603 remain, 0 skipped due to pins, 0 skipped frozen
tuples: 0 removed, 64816 remain, 64813 are dead but not yet removable, oldest xmin: 3141915780
buffer usage: 27234 hits, 0 misses, 1 dirtied
avg read rate: 0.000 MB/s, avg write rate: 0.003 MB/s
system usage: CPU: user: 0.03 s, system: 0.08 s, elapsed: 2.85 s",,,,,,,,,""
2019-09-19 00:16:13.688 CEST,,,35656,,5d382555.8b48,11190,,2019-07-24 11:31:01 CEST,,0,LOG,00000,"checkpoint complete: wrote 1748 buffers (0.0%); 0 WAL file(s) added, 0 removed, 1 recycled; write=174.932 s, sync=0.000 s, total=174.936 s; sync files=75, longest=0.000 s, average=0.000 s; distance=11366 kB, estimate=13499 kB",,,,,,,,,""
2019-09-19 00:16:41.121 CEST,,,75038,,5d82acc5.1251e,1,,2019-09-19 00:16:37 CEST,185/19338019,0,LOG,00000,"automatic vacuum of table ""my_db.pg_toast.pg_toast_22314"": index scans: 0
pages: 0 removed, 13603 remain, 0 skipped due to pins, 0 skipped frozen
tuples: 0 removed, 64816 remain, 64813 are dead but not yet removable, oldest xmin: 3141915780
buffer usage: 27233 hits, 0 misses, 0 dirtied
avg read rate: 0.000 MB/s, avg write rate: 0.000 MB/s
system usage: CPU: user: 0.04 s, system: 0.07 s, elapsed: 2.85 s",,,,,,,,,""
2019-09-19 00:16:48.335 CEST,"core","my_db",75294,"",5d82acd0.1261e,1,"idle",2019-09-19 00:16:48 CEST,315/0,0,LOG,00000,"starting logical decoding for slot ""logical_replica_from_master""","Streaming transactions committing after D5B/1D1F1C0, reading WAL from D5B/1CA07F8.",,,,,,,,"logical_replica_from_master"
2019-09-19 00:16:48.335 CEST,"core","my_db",75294,"",5d82acd0.1261e,2,"idle",2019-09-19 00:16:48 CEST,315/0,0,LOG,00000,"logical decoding found consistent point at D5B/1CA07F8","There are no running transactions.",,,,,,,,"logical_replica_from_master"
2019-09-19 00:17:41.141 CEST,,,75484,,5d82ad01.126dc,1,,2019-09-19 00:17:37 CEST,330/18178915,0,LOG,00000,"automatic vacuum of table ""my_db.pg_toast.pg_toast_22314"": index scans: 0
pages: 0 removed, 13613 remain, 0 skipped due to pins, 0 skipped frozen
tuples: 0 removed, 64866 remain, 64863 are dead but not yet removable, oldest xmin: 3141915780
buffer usage: 27254 hits, 0 misses, 0 dirtied
avg read rate: 0.000 MB/s, avg write rate: 0.000 MB/s
system usage: CPU: user: 0.04 s, system: 0.07 s, elapsed: 2.85 s",,,,,,,,,""



2019-09-19 13:33:58.015 CEST,"core","nzdb",112432,"",5d8362f5.1b730,5,"idle",2019-09-19 13:13:57 CEST,379/2076197,0,LOG,00000,"terminating walsender process due to replication timeout",,,,,"slot ""logical_replica_from_master"", output plugin ""pgoutput"", in the change callback, associated LSN D5B/6782CF0",,,"WalSndCheckTimeOut, walsender.c:2100","logical_replica_from_master"



  pid  | usesysid | usename |              application_name              |  client_addr  | client_hostname | client_port |         backend_start         | backend_xmin |  state  | sent_lsn | write_lsn | flush_lsn | replay_lsn | write_lag | flush_lag | replay_lag | sync_priority | sync_state
 87820 |    16680 | core    | logical_replica_from_master_27004_sync_21691 | |                 |       55548 | 2019-09-19 15:31:40.032662+02 |   3142872730 | startup |          |           |           |            |           |           |            |             0 | async
(1 row)






Rafel Bennassar


私の同僚の1人は、そのデータベースに同時に複数のプロセスが書き込みを行っているため、単一のWAL送信者が大量の情報に対応していない可能性があると指摘しました。これは非常に有効なアドバイスであり、その後、そもそもなぜそれについて考えなかったのかと考えて頭を悩ませていました。 @jjanesもこれに関する最初のコメントでベースに触れました。このような異なるワークロードにデフォルトのオプションを使用した場合でも、postgresがどのように適応するかにあまりに信頼を置いています。

したがって、私が今行っているのは、CREATE PUBLICATION .. FOR ALL TABLESの使用を回避し、代わりにレプリカ側に対応する複数のサブスクリプションを持つ複数のパブリケーションを作成することです。
