Distributed Databases
In this article we cover some commmon distributed databases. Because the field of distributed databases is large and varied, we can only cover a few FOSS variants.
Master-Slave and Master-Master Relational Databases
Master-slave and master-master architectures achieve data consistency and distribution by either replicating data among servers and/or sharding the data between them. Master-slave systems send writes to the master nodes, while allowing slave nodes to handle reads through a load balancer.
In general, traditional distributed databases often have:
A master or coordinator node, or multiple masters
A Distributed File System (DFS) that shares cluster-wide configuration
Some sort of load-balancer
Worker nodes, sometimes all of them
Standby nodes that can take over for failed master or worker
![digraph G {
bgcolor=wheat;
graph [splines=curved;
fontsize=14;
fontname="Comic Sans Bold";
compound=true;
ranksep="1.0"];
node [shape=record;
style="rounded,filled";
fontsize=12;
fontname="times-bold";
height="0.0";
];
edge [dir=both;];
rankdir=TB;
nodesep=".6";
client [fillcolor=bisque; label=Client];
lb [label="Load\nBalancer"; fillcolor=lightblue; ];
client:e -> lb:w [constraint=true];
{rank=same ; client lb; }
subgraph cluster_0
{
label="Cluster";
labeljust=l;
style=filled; fillcolor=lightcyan;
node [shape=box3d;
height=.5;
width=1;
fillcolor=goldenrod;
];
a [label="Master\nnode"; ];
b [label="Worker\nnode";];
c [label="Standby\n+Master\n+Worker"; ];
{rank=same; a b c;}
a -> b -> c;
lb -> a;
lb -> b;
lb -> c;
subgraph cluster_dfs
{
label="Distributed File System (DFS)";
fillcolor=lightgreen;
node [
fillcolor=lime;
shape=ellipse;
] dfs1 dfs2 dfs3;
{rank=same; dfs3 dfs2 dfs1;
dfs1 -> dfs2 -> dfs3;
}
}
a -> dfs1 [ltail=cluster_1; lhead=cluster_dfs;];
b -> dfs2 [ltail=cluster_2; lhead=cluster_dfs;];
c -> dfs3 [ltail=cluster_3; lhead=cluster_dfs;];
}
}](../../../_images/graphviz-0fb38de86792d8f856d74fc6fce371ca882f7fa8.png)
Idealization of a Distributed Database
Some examples:
Non Relational Databases
Key-Value Stores
Redis: Supports clustering with multiple nodes that shard and replicate data for high-availablity and speed.
ValKey (https://valkey.io/) and Redict (https://redict.io/): Two forks of Redis that has similar capabilities.
Etcd (https://etcd.io/): A key-value store used to synchronize cluster nodes
Apache Ignite (https://ignite.apache.org/): An in-memory distributed DB with peer-to-peer ability, designed for High-performance computing workloads.
Documentation Databases
MongoDB (https://mongodb.com): Sharded cluster architecture for horizontal scalability.
FerretDB (https://ferretdb.com): Inherits PostgreSQL’s distribution capabilities, compatible with MongoDB drivers.
Apache CouchDB (https://couchdb.apache.org): Apache CouchDB: Multi-master, peer-to-peer distributed database with bi-directional replication.
Graph Databases
A graph database stores data as nodes and edges, like a family tree structure. It can be much faster in some situations like family trees, protein modeling, and biological pathways.
Some examples of graph databases are:
Neo4j
Dgraph (https://dgraph.io)