Citus Support

The usual pg_autoctl commands work both with Postgres standalone nodes and with Citus nodes.

pg_auto_failover architecture with a Citus formation

pg_auto_failover architecture with a Citus formation

When using pg_auto_failover with Citus, a pg_auto_failover formation is composed of a coordinator and a set of worker nodes.

When High-Availability is enabled at the formation level, which is the default, then a minimum of two coordinator nodes are required: a primary and a secondary coordinator to be able to orchestrate a failover when needed.

The same applies to the worker nodes: when using pg_auto_failover for Citus HA, then each worker node is a pg_auto_failover group in the formation, and each worker group is setup with at least two nodes (primary, secondary).

Setting-up your first Citus formation

Have a look at our documentation of Citus Cluster Quick Start for more details with a full tutorial setup on a single VM, for testing and QA.

Citus specific commands and operations

When setting up Citus with pg_auto_failover, the following Citus specific commands are provided. Other pg_autoctl commands work as usual when deploying a Citus formation, so that you can use the rest of this documentation to operate your Citus deployments.

pg_autoctl create coordinator

This creates a Citus coordinator, that is to say a Postgres node with the Citus extension loaded and ready to act as a coordinator. The coordinator is always places in the pg_auto_failover group zero of a given formation.

See pg_autoctl create coordinator for details.

Important

The default --dbname is the same as the current system user name, which in many case is going to be postgres. Please make sure to use the --dbname option with the actual database that you’re going to use with your application.

Citus does not support multiple databases, you have to use the database where Citus is created. When using Citus, that is essential to the well behaving of worker failover.

pg_autoctl create worker

This command creates a new Citus worker node, that is to say a Postgres node with the Citus extensions loaded, and registered to the Citus coordinator created with the previous command. Because the Citus coordinator is always given group zero, the pg_auto_failover monitor knows how to reach the Citus coordinator and automate workers registration.

The default worker creation policy is to assign the primary role to the first worker registered, then secondary in the same group, then primary in a new group, etc.

If you want to extend an existing group to have a third worker node in the same group, enabling multiple-standby capabilities in your setup, then make sure to use the --group option to the pg_autoctl create worker command.

See pg_autoctl create worker for details.

pg_autoctl activate

This command calls the Citus “activation” API so that a node can be used to host shards for your reference and distributed tables.

When creating a Citus worker, pg_autoctl create worker automatically activates the worker node to the coordinator. You only need this command when something unexpected have happened and you want to manually make sure the worker node has been activated at the Citus coordinator level.

Starting with Citus 10 it is also possible to activate the coordinator itself as a node with shard placement. Use pg_autoctl activate on your Citus coordinator node manually to use that feature.

When the Citus coordinator is activated, an extra step is then needed for it to host shards of distributed tables. If you want your coordinator to have shards, then have a look at the Citus API citus_set_node_property to set the shouldhaveshards property to true.

See pg_autoctl activate for details.

Citus worker failover

When a failover is orchestrated by pg_auto_failover for a Citus worker node, Citus offers the opportunity to make the failover close to transparent to the application. Because the application connects to the coordinator, which in turn connects to the worker nodes, then it is possible with Citus to _pause_ the SQL write traffic on the coordinator for the shards hosted on a failed worker node. The Postgres failover then happens while the traffic is kept on the coordinator, and resumes as soon as a secondary worker node is ready to accept read-write queries.

This is implemented thanks to Citus smart locking strategy in its citus_update_node API, and pg_auto_failover takes full benefit with a special built set of FSM transitions for Citus workers.

Citus Secondaries and read-replica

It is possible to setup Citus read-only replicas. This Citus feature allows using a set of dedicated nodes (both coordinator and workers) to serve read-only traffic, such as reporting, analytics, or other parts of your workload that are read-only.

Citus read-replica nodes are a solution for load-balancing. Those nodes can’t be used as HA failover targets, and thus have their candidate-priority set to zero. This setting of a read-replica can not be changed later.

This setup is done by setting the Citus property citus.use_secondary_nodes to always (it defaults to never), and the Citus property citus.cluster_name to your read-only cluster name.

Both of those settings are entirely supported and managed by pg_autoctl when using the --citus-secondary --cluster-name cluster_d options to the pg_autoctl create coordinator|worker commands.

Here is an example where we have created a formation with three nodes for HA for the coordinator (one primary and two secondary nodes), then a single worker node with the same three nodes setup for HA, and then one read-replica environment on-top of that, for a total of 8 nodes:

$  pg_autoctl show state
    Name |  Node |      Host:Port |       LSN | Reachable |       Current State |      Assigned State
---------+-------+----------------+-----------+-----------+---------------------+--------------------
 coord0a |   0/1 | localhost:5501 | 0/5003298 |       yes |             primary |             primary
 coord0b |   0/3 | localhost:5502 | 0/5003298 |       yes |           secondary |           secondary
 coord0c |   0/6 | localhost:5503 | 0/5003298 |       yes |           secondary |           secondary
 coord0d |   0/7 | localhost:5504 | 0/5003298 |       yes |           secondary |           secondary
worker1a |   1/2 | localhost:5505 | 0/4000170 |       yes |             primary |             primary
worker1b |   1/4 | localhost:5506 | 0/4000170 |       yes |           secondary |           secondary
worker1c |   1/5 | localhost:5507 | 0/4000170 |       yes |           secondary |           secondary
reader1d |   1/8 | localhost:5508 | 0/4000170 |       yes |           secondary |           secondary

Nodes named coord0d and reader1d are members of the read-replica cluster cluster_d. We can see that a read-replica cluster needs a dedicated coordinator and then one dedicated worker node per group.

Tip

It is possible to name the nodes in a pg_auto_failover formation either at creation time or later, using one of those commands:

$ pg_autoctl create worker --name ...
$ pg_autoctl set node metadata --name ...

Here coord0d is the node name for the dedicated coordinator for cluster_d, and reader1d is the node name for the dedicated worker for cluster_d in the worker group 1 (the only worker group in that setup).

Now the usual command to show the connection strings for your application is listing the read-replica setup that way:

    $ pg_autoctl show uri
        Type |      Name | Connection String
-------------+-----------+-------------------------------
     monitor |   monitor | postgres://autoctl_node@localhost:5500/pg_auto_failover?sslmode=prefer
   formation |   default | postgres://localhost:5503,localhost:5501,localhost:5502/postgres?target_session_attrs=read-write&sslmode=prefer
read-replica | cluster_d | postgres://localhost:5504/postgres?sslmode=prefer

Given that setup, your application can now use the formation default Postgres URI to connect to the highly-available read-write service, or to the read-replica cluster_d service to connect to the read-only replica where you can offload some of your SQL workload.

When connecting to the cluster_d connection string, the Citus properties citus.use_secondary_nodes and citus.cluster_name are automatically setup to their expected values, of course.