pg_auto_failover Quick Start¶
In this guide we’ll create a primary and secondary Postgres node and set up pg_auto_failover to replicate data between them. We’ll simulate failure in the primary node and see how the system smoothly switches (fails over) to the secondary.
For simplicity we’ll run all the pieces on a single machine, but in production there would be three independent machines involved for each managed PostgreSQL service. One machine for the primary, another for the secondary, and the last as a monitor which watches the other nodes’ health, manages global state, and assigns nodes their roles.
Install the “pg_autoctl” executable¶
This guide uses Red Hat Linux, but similar steps will work on other distributions. All that differs are the packages and paths. See Installing pg_auto_failover.
pg_auto_failover is distributed as a single binary with subcommands to initialize and manage a replicated PostgreSQL service. We’ll install the binary with the operating system package manager.
curl https://install.citusdata.com/community/rpm.sh | sudo bash sudo yum install -y pg-auto-failover10_11
Run a monitor¶
The pg_auto_failover monitor is the first component to run. It periodically attempts to contact the other nodes and watches their health. It also maintains global state that “keepers” on each node consult to determine their own roles in the system.
Because we’re running everything on a single machine, we give each PostgreSQL instance a separate TCP port. We’ll give the monitor a distinctive port (6000):
sudo su - postgres export PATH="/usr/pgsql-11/bin:$PATH" pg_autoctl create monitor \ --auth trust \ --ssl-self-signed \ --pgdata ./monitor \ --pgport 6000
This command initializes a PostgreSQL cluster at the location pointed
--pgdata option. When
--pgdata is omitted,
attempts to use the
PGDATA environment variable. If a PostgreSQL
instance had already existing in the destination directory, this command
would have configured it to serve as a monitor.
In our case,
pg_autoctl create monitor creates a database called
pg_auto_failover, installs the
pgautofailover Postgres extension,
and grants access to a new
In the Quick Start we use
--auth trust to avoid complex security settings.
The Postgres trust authentication method is not considered a reasonable
choice for production environments. Consider either using the
--auth scram-sha-256 and then setting up passwords yourself.
Bring up the nodes¶
We’ll create the primary database using the
pg_autoctl create subcommand.
However in order to simulate what happens if a node runs out of disk space,
we’ll store the primary node’s data files in a small temporary filesystem.
# create intentionally small disk for node A sudo mkdir /mnt/node_a sudo mount -t tmpfs -o size=400m tmpfs /mnt/node_a sudo mkdir /mnt/node_a/data sudo chown postgres -R /mnt/node_a # initialize on that disk pg_autoctl create postgres \ --auth trust \ --ssl-self-signed \ --pgdata /mnt/node_a/data \ --pgport 6010 \ --pgctl /usr/pgsql-11/bin/pg_ctl \ --monitor postgres://email@example.com:6000/pg_auto_failover?sslmode=require
It creates the database in “/mnt/node_a/data” using the port and node specified. Notice the user and database name in the monitor connection string – these are what monitor init created. We also give it the path to pg_ctl so that the keeper will use the correct version of pg_ctl in future even if other versions of postgres are installed on the system.
In the example above, the keeper creates a primary database. It chooses to set up node A as primary because the monitor reports there are no other nodes in the system yet. This is one example of how the keeper is state-based: it makes observations and then adjusts its state, in this case from “init” to “single.”
At this point the monitor and primary nodes are created and running. Next we need to run the keeper. It’s an independent process so that it can continue operating even if the Postgres primary goes down:
pg_autoctl run --pgdata /mnt/node_a/data
This will remain running in the terminal, outputting logs. We can open another terminal and start a secondary database similarly to how we created the primary:
pg_autoctl create postgres \ --auth trust \ --ssl-self-signed \ --pgdata ./node_b \ --pgport 6011 \ --pgctl /usr/pgsql-11/bin/pg_ctl \ --monitor postgres://firstname.lastname@example.org:6000/pg_auto_failover?sslmode=require pg_autoctl run --pgdata ./node_b
All that differs here is we’re running it on another port and pointing at another data directory. It discovers from the monitor that a primary exists, and then switches its own state to be a hot standby and begins streaming WAL contents from the primary.
Watch the replication¶
First let’s verify that the monitor knows about our nodes, and see what states it has assigned them:
pg_autoctl show state --pgdata ./monitor Name | Port | Group | Node | Current State | Assigned State ----------+--------+-------+-------+-------------------+------------------ 127.0.0.1 | 6010 | 0 | 1 | primary | primary 127.0.0.1 | 6011 | 0 | 2 | secondary | secondary
This looks good. We can add data to the primary, and watch it get reflected in the secondary.
# add data to primary psql -p 6010 \ -c 'create table foo as select generate_series(1,1000000) bar;' # query secondary psql -p 6011 -c 'select count(*) from foo;' count --------- 1000000
Cause a failover¶
This plot is too boring, time to introduce a problem. We’ll run the primary out of disk space and watch the secondary get promoted.
In one terminal let’s keep an eye on events:
watch pg_autoctl show events --pgdata ./monitor
In another terminal we’ll consume node A’s disk space and try to restart the database. It will refuse to start up.
# postgres went to sleep one night and didn’t wake up… pg_ctl -D /mnt/node_a/data stop && dd if=/dev/zero of=/mnt/node_a/bigfile bs=300MB count=1 # it will refuse to start back up with no free disk space df /mnt/node_a Filesystem 1K-blocks Used Available Use% Mounted on tmpfs 409600 409600 0 100% /mnt/node_a
After a number of failed attempts to restart node A, its keeper signals that the node is unhealthy and the node is put into the “demoted” state. The monitor promotes node B to be the new primary.
pg_autoctl show state --pgdata ./monitor Name | Port | Group | Node | Current State | Assigned State ----------+--------+-------+-------+-------------------+------------------ 127.0.0.1 | 6010 | 0 | 1 | demoted | catchingup 127.0.0.1 | 6011 | 0 | 2 | wait_primary | wait_primary
Node B cannot be considered in full “primary” state since there is no secondary present. It is marked as “wait_primary” until a secondary appears.
A client, whether a web server or just psql, can list multiple
hosts in its PostgreSQL connection string, and use the parameter
target_session_attrs to add rules about which server to choose.
To discover the url to use in our case, the following command can be used:
pg_autoctl show uri --pgdata ./monitor Type | Name | Connection String -----------+---------+------------------------------- monitor | monitor | port=6000 dbname=pg_auto_failover host=/tmp user=autoctl_node formation | default | postgres://127.0.0.1:6010,127.0.0.1:6011/?target_session_attrs=read-write
Here we ask to connect to either node A or B – whichever supports reads and writes:
psql \ 'postgres://localhost:6010,localhost:6011/postgres?target_session_attrs=read-write&sslmode=require'
When nodes A and B were both running, psql would connect to node A because B would be read-only. However now that A is offline and B is writeable, psql will connect to B. We can insert more data:
-- on the prompt from the psql command above: insert into foo select generate_series(1000001, 2000000);
Resurrect node A¶
Let’s increase the disk space for node A, so it’s able to run again.
Now the next time the keeper retries, it brings the node back. Node A goes through the state “catchingup” while it updates its data to match B. Once that’s done, A becomes a secondary, and B is now a full primary.
pg_autoctl show state --pgdata ./monitor Name | Port | Group | Node | Current State | Assigned State ----------+--------+-------+-------+-------------------+------------------ 127.0.0.1 | 6010 | 0 | 1 | secondary | secondary 127.0.0.1 | 6011 | 0 | 2 | primary | primary
What’s more, if we connect directly to node A and run a query we can see it contains the rows we inserted while it was down.
psql -p 6010 -c 'select count(*) from foo;' count --------- 2000000