Databases

In this article we cover some commmon distributed databases. Because the field of distributed databases is large and varied, we can only cover a few FOSS variants.

Master-Slave and Master-Master Relational Databases

Master-slave and master-master architectures achieve data consistency and distribution by either replicating data among servers and/or sharding the data between them. Master-slave systems send writes to the master nodes, while allowing slave nodes to handle reads through a load balancer.

In general, traditional distributed databases often have:

A master or coordinator node, or multiple masters
A Distributed File System (DFS) that shares cluster-wide configuration
Some sort of load-balancer
Worker nodes, sometimes all of them
Standby nodes that can take over for failed master or worker

digraph G {
bgcolor=wheat;
graph [splines=curved;
fontsize=14;
fontname="Comic Sans Bold";
compound=true;
ranksep="1.0"];
node [shape=record;
style="rounded,filled";
fontsize=12;
fontname="times-bold";
height="0.0";
];
edge [dir=both;];
rankdir=TB;
nodesep=".6";

client [fillcolor=bisque; label=Client];
lb [label="Load\nBalancer"; fillcolor=lightblue; ];
client:e -> lb:w [constraint=true];
{rank=same ; client lb; }

subgraph cluster_0
{
label="Cluster";
labeljust=l;
style=filled; fillcolor=lightcyan;

node [shape=box3d;
height=.5;
width=1;
fillcolor=goldenrod;
];
a [label="Master\nnode"; ];
b [label="Worker\nnode";];
c [label="Standby\n+Master\n+Worker"; ];

{rank=same; a b c;}
a -> b -> c;

lb -> a;
lb -> b;
lb -> c;

subgraph cluster_dfs
{
label="Distributed File System (DFS)";
fillcolor=lightgreen;

node [
fillcolor=lime;
shape=ellipse;
] dfs1 dfs2 dfs3;
{rank=same; dfs3 dfs2 dfs1;
dfs1 -> dfs2 -> dfs3;
}
}

a -> dfs1 [ltail=cluster_1; lhead=cluster_dfs;];
b -> dfs2 [ltail=cluster_2; lhead=cluster_dfs;];
c -> dfs3 [ltail=cluster_3; lhead=cluster_dfs;];
}
} — Idealization of a Distributed Database

Some examples:

Non Relational Databases

Key-Value Stores

Redis: Supports clustering with multiple nodes that shard and replicate data for high-availablity and speed.
ValKey (https://valkey.io/) and Redict (https://redict.io/): Two forks of Redis that has similar capabilities.
Etcd (https://etcd.io/): A key-value store used to synchronize cluster nodes
Apache Ignite (https://ignite.apache.org/): An in-memory distributed DB with peer-to-peer ability, designed for High-performance computing workloads.

Documentation Databases

MongoDB (https://mongodb.com): Sharded cluster architecture for horizontal scalability.
FerretDB (https://ferretdb.com): Inherits PostgreSQL’s distribution capabilities, compatible with MongoDB drivers.
Apache CouchDB (https://couchdb.apache.org): Apache CouchDB: Multi-master, peer-to-peer distributed database with bi-directional replication.

Graph Databases

A graph database stores data as nodes and edges, like a family tree structure. It can be much faster in some situations like family trees, protein modeling, and biological pathways.

Some examples of graph databases are:

Neo4j
Dgraph (https://dgraph.io)