转自:PostgreSQL 之 pg_rewind 详解

一、pg_rewind简介

pg_rewind 是postgresql主丛数据库之同步数据目录的工具。

pg_rewind只复制表数据文件中更改的块;所有其他文件都被完整复制,包括配置文件。pg_rewind相对于使用pg_basebackup备份或rsync等工具的优势在于,pg_rewind不需要读取数据库中未更改的块。这使得在数据库很大且之间只有一小部分块不同的情况下,速度会快得多。

二、使用前提

需要目标服务器在postgresql.conf 中允许walloghints,或者,在 initdb初始化集群时允许 checksums ,full_page_writes也必须为on

三、命令详解:

snippet.bash
[atlasdb@vm1 ~]$ /usr/local/atlasdb/bin/pg_rewind --help
pg_rewind resynchronizes a PostgreSQL cluster with another copy of the cluster.
 
Usage:
  pg_rewind [OPTION]...
 
Options:
  -D, --target-pgdata=DIRECTORY  existing data directory to modify
      --source-pgdata=DIRECTORY  source data directory to synchronize with
      --source-server=CONNSTR    source server to synchronize with
  -n, --dry-run                  stop before modifying anything
  -P, --progress                 write progress messages
      --debug                    write a lot of debug messages
  -V, --version                  output version information, then exit
  -?, --help                     show this help, then exit
 
Report bugs to <pgsql-bugs@postgresql.org>.

-D directory –target-pgdata=directory

此选项指定与源同步的目标数据目录。在运行pg_rewind之前,必须干净关闭目标服务器

–source-pgdata=directory

指定要与之同步的源服务器的数据目录的文件系统路径。此选项要求干净关闭源服务器

–source-server=connstr

指定要连接到源PostgreSQL服务器的libpq连接字符串。连接必须是具有超级用户访问权限的正常(非复制)连接。此选项要求源服务器正在运行,而不是处于恢复模式

-n –dry-run

除了实际修改目标目录之外,执行所有操作。

-P –progress

输出进展报告。

–debug

输出很多Debug的信息。如果失败时,可以用此选项定位错误原因。

例如:

snippet.bash
pg_rewind --target-pgdata=/data/atlasdb --source-server='host=192.168.0.201 port=5432 user=atlasdb password=atlasdb dbname=atlasdb' -P --debug

四、测试

环境准备

准备两台主机,安装postgresql v11.5。并配置好流复制

vm1、vm2环境变量设置:

snippet.bash
export PGDATABASE=atlasdb
export PGHOST=localhost
export PGPORT=5432
export PGUSER=atlasdb
export PGHOME=/home/atlasdb/atlasdb
export PGDATA=/data/atlasdb
export PATH=$PGHOME/bin:$HOME/bin:$PATH
export LD_LIBRARY_PATH=$PGHOME/lib/:$LD_LIBRARY_PATH

vm1查询流复制情况

snippet.bash
[atlasdb@vm1 ~]$ psql
psql (11.5)
PSQL: Release 2.7.0
Connected to:
AtlasDB V2.7 Enterprise Edition Release - 64-bit Production
Type "help" for help.
 
atlasdb=# select * from pg_stat_replication ;
 pid  | usesysid | usename | application_name |  client_addr  | client_hostname | client_port |         backend_start         | backend_xmin |   state   | sent_lsn  | write_lsn | flush_lsn | replay_lsn | write_lag | flush_lag | replay_lag | sync_priority |
 sync_state 
------+----------+---------+------------------+---------------+-----------------+-------------+-------------------------------+--------------+-----------+-----------+-----------+-----------+------------+-----------+-----------+------------+---------------+
------------
 3421 |       10 | atlasdb | walreceiver      | 192.168.0.202 |                 |       33462 | 2019-12-25 18:38:34.168977+08 |              | streaming | 0/3000060 | 0/3000060 | 0/3000060 | 0/3000060  |           |           |            |             0 |
 async
(1 row)
 
atlasdb=# select pg_is_in_recovery();
 pg_is_in_recovery 
-------------------
 f
(1 row)
 
atlasdb=# 

测试过程

主库vm1:创建测试表和测试数据

snippet.bash
atlasdb=# create table test_tab(id int,name varchar(20),e_mail varchar(20),d_id int);
CREATE TABLE
atlasdb=# \dt+
                      List of relations
 Schema |   Name   | Type  |  Owner  |  Size   | Description 
--------+----------+-------+---------+---------+-------------
 public | test_tab | table | atlasdb | 0 bytes | 
(1 row)
 
atlasdb=# insert into test_tab values(1,'hemny','hemny@qq.com',10);
INSERT 0 1
atlasdb=# select * from test_tab ;
 id | name  |    e_mail    | d_id 
----+-------+--------------+------
  1 | hemny | hemny@qq.com |   10
(1 row)
 
atlasdb=# 

从库vm2:查询数据

snippet.bash
[atlasdb@vm2 ~]$ psql
psql (11.5)
PSQL: Release 2.7.0
Connected to:
AtlasDB V2.7 Enterprise Edition Release - 64-bit Production
Type "help" for help.
 
atlasdb=# select pg_is_in_recovery();
 pg_is_in_recovery 
-------------------
 t
(1 row)
 
atlasdb=# \dt+
                       List of relations
 Schema |   Name   | Type  |  Owner  |    Size    | Description 
--------+----------+-------+---------+------------+-------------
 public | test_tab | table | atlasdb | 8192 bytes | 
(1 row)
 
atlasdb=#  select * from test_tab ;
 id | name  |    e_mail    | d_id 
----+-------+--------------+------
  1 | hemny | hemny@qq.com |   10
(1 row)
 
atlasdb=#

提升从库vm2为新主库

snippet.bash
[atlasdb@vm2 ~]$ pg_ctl promote
waiting for server to promote.... done
server promoted
[atlasdb@vm2 ~]$ psql
psql (11.5)
PSQL: Release 2.7.0
Connected to:
AtlasDB V2.7 Enterprise Edition Release - 64-bit Production
Type "help" for help.
 
atlasdb=# select pg_is_in_recovery();
 pg_is_in_recovery 
-------------------
 f
(1 row)
 
atlasdb=# 

旧主库vm1插入一条记录,模拟旧主库上的数据没有复制到新主库上

snippet.bash
atlasdb=# insert into test_tab values(2,'hemny_pg1','hemny_pg1@qq.com',10);
INSERT 0 1
atlasdb=# select * from test_tab ; 
 id |   name    |      e_mail      | d_id 
----+-----------+------------------+------
  1 | hemny     | hemny@qq.com     |   10
  2 | hemny_pg1 | hemny_pg1@qq.com |   10
(2 rows)
 
atlasdb=# 

新主库vm2(原从库)插入一条记录

snippet.bash
atlasdb=#  insert into test_tab values(2,'hemny_pg2','hemny_pg2@qq.com',10);
INSERT 0 1
atlasdb=# select * from test_tab ;
 id |   name    |      e_mail      | d_id 
----+-----------+------------------+------
  1 | hemny     | hemny@qq.com     |   10
  2 | hemny_pg2 | hemny_pg2@qq.com |   10
(2 rows)
 
atlasdb=# 

将原主库变成新主库的从库

snippet.bash
--停止原主库
[atlasdb@vm1 ~]$  pg_ctl -D /data/atlasdb -l logfile stop
waiting for server to shut down.... done
server stopped
 
--配置流复制文件和参数
[atlasdb@vm1 ~]$ vi $PGDATA/recovery.conf
[atlasdb@vm1 ~]$ cat $PGDATA/recovery.conf 
standby_mode = 'on'
primary_conninfo = 'user=atlasdb password=atlasdb host=192.168.0.202 port=5432 sslmode=prefer sslcompression=0 target_session_attrs=any'
[atlasdb@vm1 ~]$ 
 
-- 启动数据库
[atlasdb@vm1 ~]$  pg_ctl -D /data/atlasdb -l logfile start
waiting for server to start.... done
server started
[atlasdb@vm1 ~]$ psql
psql (11.5)
PSQL: Release 2.7.0
Connected to:
AtlasDB V2.7 Enterprise Edition Release - 64-bit Production
Type "help" for help.
 
atlasdb=# select pg_is_in_recovery();                                                
 pg_is_in_recovery 
-------------------
 t
(1 row)
 
atlasdb=# select * from test_tab ; 
 id |   name    |      e_mail      | d_id 
----+-----------+------------------+------
  1 | hemny     | hemny@qq.com     |   10
  2 | hemny_pg1 | hemny_pg1@qq.com |   10
(2 rows)

这里可以看到,新主库(vm2)上的记录没有复制过来

vm1的日志信息

snippet.bash
[atlasdb@vm1 ~]$ cat logfile 
2019-12-25 18:51:16.434 CST [3441] LOG:  listening on IPv4 address "0.0.0.0", port 5432
2019-12-25 18:51:16.434 CST [3441] LOG:  listening on IPv6 address "::", port 5432
2019-12-25 18:51:16.438 CST [3441] LOG:  listening on Unix socket "/tmp/.s.PGSQL.5432"
2019-12-25 18:51:16.453 CST [3442] LOG:  database system was shut down at 2019-12-25 18:46:51 CST
2019-12-25 18:51:16.453 CST [3442] LOG:  entering standby mode
2019-12-25 18:51:16.457 CST [3442] LOG:  consistent recovery state reached at 0/301BA58
2019-12-25 18:51:16.457 CST [3442] LOG:  invalid record length at 0/301BA58: wanted 24, got 0
2019-12-25 18:51:16.457 CST [3441] LOG:  database system is ready to accept read only connections
    Customer:'debug user'
   Begins On:'1970-01-01 08:00:01'
  Expires On:'273795839-07-29 17:46:41'
2019-12-25 18:51:16.492 CST [3446] LOG:  fetching timeline history file for timeline 2 from primary server
2019-12-25 18:51:16.499 CST [3446] LOG:  started streaming WAL from primary at 0/3000000 on timeline 1
2019-12-25 18:51:16.500 CST [3446] LOG:  replication terminated by primary server
2019-12-25 18:51:16.500 CST [3446] DETAIL:  End of WAL reached on timeline 1 at 0/301B8B8.
2019-12-25 18:51:16.502 CST [3446] LOG:  restarted WAL streaming at 0/3000000 on timeline 1
2019-12-25 18:51:16.503 CST [3446] LOG:  replication terminated by primary server
2019-12-25 18:51:16.503 CST [3446] DETAIL:  End of WAL reached on timeline 1 at 0/301B8B8.
2019-12-25 18:51:21.509 CST [3446] LOG:  restarted WAL streaming at 0/3000000 on timeline 1
2019-12-25 18:51:21.511 CST [3446] LOG:  replication terminated by primary server
2019-12-25 18:51:21.511 CST [3446] DETAIL:  End of WAL reached on timeline 1 at 0/301B8B8.
2019-12-25 18:51:26.515 CST [3446] LOG:  restarted WAL streaming at 0/3000000 on timeline 1
2019-12-25 18:51:26.516 CST [3446] LOG:  replication terminated by primary server
2019-12-25 18:51:26.516 CST [3446] DETAIL:  End of WAL reached on timeline 1 at 0/301B8B8.

vm1上停止新从库

snippet.bash
[atlasdb@vm1 ~]$  pg_ctl -D /data/atlasdb -l logfile stop
waiting for server to shut down.... done
server stopped
[atlasdb@vm1 ~]$ 

在vm1上使得pg_rewind 同步数据库时间线

snippet.bash
[atlasdb@vm1 ~]$ pg_rewind --target-pgdata=$PGDATA --source-server='host=192.168.0.202 port=5432 user=atlasdb password=atlasdb dbname=atlasdb' -P
connected to server
servers diverged at WAL location 0/301B8B8 on timeline 1
rewinding from last common checkpoint at 0/301B810 on timeline 1
reading source file list
reading target file list
reading WAL in target
need to copy 52 MB (total source directory size is 76 MB)
53300/53300 kB (100%) copied
creating backup label and updating control file
syncing target data directory
Done!
[atlasdb@vm1 ~]$ 

注意,pg_rewind之后,需要重新配置流复制文件

snippet.bash
[atlasdb@vm1 ~]$ mv $PGDATA/recovery.done $PGDATA/recovery.conf
[atlasdb@vm1 ~]$ vi $PGDATA/recovery.conf
[atlasdb@vm1 ~]$ cat $PGDATA/recovery.conf                     
standby_mode = 'on'
primary_conninfo = 'user=atlasdb password=atlasdb host=192.168.0.202 port=5432 sslmode=prefer sslcompression=0 target_session_attrs=any'
[atlasdb@vm1 ~]$ 

注意要修改连接信息

在vm1上启动新的从库

snippet.bash
[atlasdb@vm1 ~]$  pg_ctl -D /data/atlasdb -l logfile start
waiting for server to start.... done
server started
[atlasdb@vm1 ~]$ psql
psql (11.5)
PSQL: Release 2.7.0
Connected to:
AtlasDB V2.7 Enterprise Edition Release - 64-bit Production
Type "help" for help.
 
atlasdb=# select pg_is_in_recovery();
 pg_is_in_recovery 
-------------------
 t
(1 row)
 
atlasdb=# select * from test_tab ; 
 id |   name    |      e_mail      | d_id 
----+-----------+------------------+------
  1 | hemny     | hemny@qq.com     |   10
  2 | hemny_pg2 | hemny_pg2@qq.com |   10
(2 rows)
 
atlasdb=# 

在新主库上插入的记录已同步,原主库没有复制到丛库的记录消失,完成

五、验证

新主库vm2中

snippet.bash
[atlasdb@vm2 ~]$ psql
psql (11.5)
PSQL: Release 2.7.0
Connected to:
AtlasDB V2.7 Enterprise Edition Release - 64-bit Production
Type "help" for help.
 
atlasdb=# select pg_is_in_recovery();
 pg_is_in_recovery 
-------------------
 f
(1 row)
 
atlasdb=# select * from pg_stat_replication ;
 pid  | usesysid | usename | application_name |  client_addr  | client_hostname | client_port |         backend_start         | backend_xmin |   state   | sent_lsn  | write_lsn | flush_lsn | replay_lsn |    write_lag    |   flush_lag    |   replay_lag    |
 sync_priority | sync_state 
------+----------+---------+------------------+---------------+-----------------+-------------+-------------------------------+--------------+-----------+-----------+-----------+-----------+------------+-----------------+----------------+-----------------+
---------------+------------
 3399 |       10 | atlasdb | walreceiver      | 192.168.0.201 |                 |       38144 | 2019-12-25 18:59:59.275035+08 |              | streaming | 0/30475D8 | 0/30475D8 | 0/30475D8 | 0/30475D8  | 00:00:00.006679 | 00:00:00.00858 | 00:00:00.008581 |
             0 | async
(1 row)
 
atlasdb=# select * from test_tab ;
 id |   name    |      e_mail      | d_id 
----+-----------+------------------+------
  1 | hemny     | hemny@qq.com     |   10
  2 | hemny_pg2 | hemny_pg2@qq.com |   10
(2 rows)
 
atlasdb=# insert into  test_tab values(3,'hemny_pg2_new','hemny_pg2_new@qq.com',10);
INSERT 0 1
atlasdb=# select * from test_tab ;
 id |     name      |        e_mail        | d_id 
----+---------------+----------------------+------
  1 | hemny         | hemny@qq.com         |   10
  2 | hemny_pg2     | hemny_pg2@qq.com     |   10
  3 | hemny_pg2_new | hemny_pg2_new@qq.com |   10
(3 rows)
 
atlasdb=# 

新从库vm1中

snippet.bash
atlasdb=# select * from test_tab ; 
 id |     name      |        e_mail        | d_id 
----+---------------+----------------------+------
  1 | hemny         | hemny@qq.com         |   10
  2 | hemny_pg2     | hemny_pg2@qq.com     |   10
  3 | hemny_pg2_new | hemny_pg2_new@qq.com |   10
(3 rows)
 
atlasdb=#