Saturday, March 9, 2019

Distributing the database

You want to design you application to scale, so need to choose a database solution that will scale.
And it would be nice to have the querying and integrity features of an RDBMS.

Assuming that we want to shard data, and not just have a replica,
Some questions arise :
  •  How easy would it be to add a new node ?Should sharding be automatic, or per some partitioning rules ? If as per rules, what happens when we add/remove nodes, the data will need to be redistributed.
  • If a table is distributed across nodes, what happens to primary key ids ? How to prevent duplicates across nodes ? Id ranges ?
  • One major use of ids as primary keys is for use in foreign key constraints. But will a distributed table support foreign keys ? Probably not.
  • Will ACID transactions be available ?

Found some open-source solutions to scale RDBMS :
  • CockroachDB (https://www.cockroachlabs.com/) : A key-value store that supports the wire-protocol of Postgresql, sql, ACID transactions etc. So it behaves as if its a distributed postgresql to the sql drivers, but internally is not.
  • Citus : https://www.citusdata.com : A Postgresql extension that allows us to scale Postgresql. The underlying db is indeed postgresql. However, not all postgresql features can be supported in distributed mode.
  • Posgres-XL (https://www.postgres-xl.org/overview/)

No comments:

Post a Comment