Distributed Databases

In this article we cover some commmon distributed databases. Because the field of distributed databases is large and varied, we can only cover a few FOSS variants.

Master-Slave and Master-Master Relational Databases

Master-slave and master-master architectures achieve data consistency and distribution by either replicating data among servers and/or sharding the data between them. Master-slave systems send writes to the master nodes, while allowing slave nodes to handle reads through a load balancer.

In general, traditional distributed databases often have:

  • A master or coordinator node, or multiple masters

  • A Distributed File System (DFS) that shares cluster-wide configuration

  • Some sort of load-balancer

  • Worker nodes, sometimes all of them

  • Standby nodes that can take over for failed master or worker

digraph G {
   bgcolor=wheat;
   graph [splines=curved;
          fontsize=14;
          fontname="Comic Sans Bold";
          compound=true;
          ranksep="1.0"];
   node [shape=record;
         style="rounded,filled";
         fontsize=12;
         fontname="times-bold";
         height="0.0";
         ];
    edge [dir=both;];
   rankdir=TB;
   nodesep=".6";

      client [fillcolor=bisque; label=Client];
      lb [label="Load\nBalancer"; fillcolor=lightblue; ];
      client:e -> lb:w [constraint=true];
      {rank=same ; client lb; }

   subgraph cluster_0
   {
      label="Cluster";
      labeljust=l;
      style=filled; fillcolor=lightcyan;

       node [shape=box3d;
             height=.5;
             width=1;
             fillcolor=goldenrod;
             ];
          a [label="Master\nnode"; ];
          b [label="Worker\nnode";];
          c [label="Standby\n+Master\n+Worker"; ];

     {rank=same; a b c;}
      a -> b -> c;

      lb -> a;
      lb -> b;
      lb -> c;

      subgraph cluster_dfs
      {
          label="Distributed File System (DFS)";
          fillcolor=lightgreen;

          node [
                fillcolor=lime;
                shape=ellipse;
                ] dfs1 dfs2 dfs3;
          {rank=same; dfs3 dfs2 dfs1;
          dfs1 -> dfs2 -> dfs3;
          }
      }

      a -> dfs1 [ltail=cluster_1; lhead=cluster_dfs;];
      b -> dfs2 [ltail=cluster_2; lhead=cluster_dfs;];
      c -> dfs3 [ltail=cluster_3; lhead=cluster_dfs;];
   }
}

Idealization of a Distributed Database

Some examples:

Non Relational Databases

Key-Value Stores

  • Redis: Supports clustering with multiple nodes that shard and replicate data for high-availablity and speed.

  • ValKey (https://valkey.io/) and Redict (https://redict.io/): Two forks of Redis that has similar capabilities.

  • Etcd (https://etcd.io/): A key-value store used to synchronize cluster nodes

  • Apache Ignite (https://ignite.apache.org/): An in-memory distributed DB with peer-to-peer ability, designed for High-performance computing workloads.

Documentation Databases

  • MongoDB (https://mongodb.com): Sharded cluster architecture for horizontal scalability.

  • FerretDB (https://ferretdb.com): Inherits PostgreSQL’s distribution capabilities, compatible with MongoDB drivers.

  • Apache CouchDB (https://couchdb.apache.org): Apache CouchDB: Multi-master, peer-to-peer distributed database with bi-directional replication.

Graph Databases

A graph database stores data as nodes and edges, like a family tree structure. It can be much faster in some situations like family trees, protein modeling, and biological pathways.

Some examples of graph databases are: