PostgreSQL

PostgreSQL is not a natively distributed database. It can be extended to act in a distributed way though.

Multi Master

In order to be a truly distributed database we have to first deploy PostgreSQL as a multi-master cluster. It would look like this:

digraph G {
   // Define the graph layout direction (top to bottom)
   rankdir=TB;  // Optional: use LR for left-to-right
   bgcolor="black";
   edge [color="white"]; // Edges in white

   // Define nodes with specific positions
   A [style=filled, fillcolor="white", fontcolor="black",
      label="PSQL\nMaster-A", pos="0,1!"];  // Top node
   B [style=filled, fillcolor="white", fontcolor="black",
      label="PSQL\nMaster-B", pos="-1,0!"]; // Bottom left
   C [style=filled, fillcolor="white", fontcolor="black",
      label="PSQL\nMaster-C", pos="1,0!"];  // Bottom right

   // Connect the nodes
   A -> B -> C -> A;
   C -> B -> A -> C;

   // Specify the layout
   graph [layout="neato"];
}

This solution does have some benefits:

  • High read and write availability

  • Low read/write latency

  • Read throughput scales linearly

But it has drawbacks:

  • There is no load balancing, so no read/write scaling is hard

  • Eventual read/write consistency

  • No linear history

  • All nodes have all data, which is not efficient

Citus

Citus is a PostgreSQL extension that turns PostgreSQL into a distributed database. It shards the data into distinct sets so that the load is spread across nodes. It might look like this:

digraph citus {
   // Define the graph layout direction (top to bottom)
   rankdir=TB;  // Top-Bottom rank
   bgcolor="black";
   edge [color="white"]; // Edges in white

   // Define nodes with specific positions
   A [style=filled, fillcolor="white", fontcolor="black",
      label="Coordinator"];  // Top node
   B [style=filled, fillcolor="white", fontcolor="black",
      label="PSQL\nNode-1"]; // Bottom left
   C [style=filled, fillcolor="white", fontcolor="black",
      label="PSQL\nNode-2"];  // Bottom right
   D [style=filled, fillcolor="white", fontcolor="black", label="PSQL\nNode-3"];

   // Connect the nodes
   {rank = same; B C; }
   A -> B;
   A -> C;
   A -> D;

}

As you see, if the coordinator is broken, the cluster can’t continue. Also, if a node dies, so does all its shard of data, and Citus has no way to recover.

There are various ways to make the Coordinator and nodes less vulnerable through high-availability methods, but that requires more complexity. Most organizations would use automation to deploy all such configurations.

Conclusion

PSQL is not a natively distributed database, and its not easy to make it into one. Still, you can bolt on other solutions to make it more distributed. In the next sections we offer some other solutions.

References