
gittech. site
for different kinds of informations and explorations.
Pgwrh β PostgreSQL read replica sharding
pgwrh
An extension implementing sharding for PostgreSQL based on logical replication and postgres_fdw. The goal is to scale read queries overcoming main limitation of traditional setups based on streaming replication and hot standbys: lack of sharding and large storage requirements.
See Architecture for more information on inner workings.
:warning: WIP: readme might be incomplete and contain mistakes in usage instrutions (as the API is still changing)
Features
Horizontal Scalability and High Availability
No need for rebalancing
Setting up and maintaining a highly available cluster of sharded storage servers is inherently tricky, especially during changes to cluster topology. Adding a new replica often requires rebalancing (ie. reorganizing data placement among replicas).
pgwrh minimizes the need to copy data by utilizing Weighted Randezvous Hashing algorithm to distribute shards among replicas. Adding replicas never requires moving data between existing ones.
Data redundancy
pgwrh maintains requested level of redundancy of shard data.
Administrator can specify:
- the percentage of replicas to host each shard
- the minimum number of copies of any shard (regardless of the percentage setting above)
So it is possible to implement policies like: "Shards X, Y, Z should be distributed among 20% of replicas in the cluster, but in no fewer than 2 copies".
Availability zones
Replicas can be assigned to availability zones and pgwrh ensures shard copies are distributed evenly across all of them.
Zero downtime reconfiguration of cluster topology
Changing cluster topology very often requires lengthy process of data copying and indexing. Exposing replicas that do not have necessary indexes created imposes a risk of downtimes due to long queries causing exhaustion of connection pools.
pgwrh makes sure the cluster can operate without disruptions and that not-yet-ready replicas are isolated from query traffic.
Sharding policy flexibility and storage tiering
pgwrh does not dictate how data is split into shards. It is possible to implement any sharding policy by utilizing PostgreSQL partitioning. pgwrh will distribute leaves of partition hierarchy among replicas. It is also possible to specify different levels of redundancy for different subtrees of partitioning hierarchy.
Thanks to this it is possible to have more replicas maintain hot data and have cold data storage requirements minimized.
Ease of deployment and cluster administration
Pure SQL/PGSQL
This makes it easy to use pgwrh in cloud environments that limit possibilities of custom extension installation.
Caveat at the moment pgwrh requires pg_background to operate as it needs a way to execute SQL commands outside current transaction (CREATE/ALTER SUBSCRIPTION must not be executed in transaction).
Based on built-in PostgreSQL facilities - no need for custom query parser/planner
Contrary to other PostgreSQL sharding solutions that implement a query parser and interpreter to direct queries to the right replicas, pgwrh reuses built-in PostgreSQL features: partitioning and postgres_fdw.
PostgreSQL query planner and executor - while still somewhat limited - have capabilities to distribute computing among multiple machines by:
- pushing down filtering and aggregates (see https://www.postgresql.org/docs/current/runtime-config-query.html#GUC-ENABLE-PARTITIONWISE-AGGREGATE)
- skip execution of unnecessary query plan nodes (see https://www.postgresql.org/docs/current/runtime-config-query.html#GUC-ENABLE-PARTITIONWISE-AGGREGATE)
Installation
Prerequisites
Name | Version |
---|---|
PostgreSQL | 16+ |
pg_background | 1.2+ |
Extension installation
Clone the Git repository.
git clone https://github.com/mkleczek/pgwrh.git
Install the extension.
cd pgwrh
make install
Create extension in PostgreSQL database.
psql -c "CREATE EXTENSION pgwrh CASCADE"
Usage
On master server
Create your sharded table partitioning hierarchy
The below example would create a two-level partition hierarchy for test.my_table
:
- First level by dates in
col3
(split by year) - Second level by hash on
col2
CREATE SCHEMA IF NOT EXISTS test;
CREATE TABLE test.my_data (col1 text, col2 text, col3 date) PARTITION BY RANGE (col3);
CREATE TABLE test.my_data_2023 PARTITION OF parent FOR VALUES FROM (make_date(2023, 1, 1)) TO (make_date(2024, 1, 1));
CREATE TABLE test.my_data_2024 PARTITION OF parent FOR VALUES FROM (make_date(2024, 1, 1)) TO (make_date(2025, 1, 1));
CREATE TABLE test.my_data_2025 PARTITION OF parent FOR VALUES FROM (make_date(2025, 1, 1)) TO (make_date(2026, 1, 1));
CREATE SCHEMA IF NOT EXISTS test_shards;
DO$$
DECLARE
r record;
BEGIN
FOR r IN
SELECT
format('CREATE TABLE test_shards.my_data_%1$s_%2$s PARTITION OF test.my_data_%1$s (PRIMARY KEY (col1)) FOR VALUES WITH (MODULUS 16, REMAINDER %2$s)', year, rem) stmt
FROM generate_series(2023, 2025) year, generate_series(0, 15) rem
LOOP
EXECUTE r.stmt;
END LOOP;
END$$;
That gives 48 (16 * 3) shards in total.
Note that there are no specific requirements for the partitioning hierarchy and any partitioned table can be sharded - the above is only for illustration purposes.
Create a replica cluster
Example:
SELECT pgwrh.create_replica_cluster('c01');
Configure roles and user accounts for replicas
(Optional) Create a role for you cluster replicas and grant rights to SELECT from shards.
CREATE ROLE c01_replica;
GRANT SELECT ON ALL TABLES IN SCHEMA test_shards TO c01_replica;
Create account for each replica.
CREATE USER c01r01 PASSWORD 'c01r01Password' REPLICATION IN ROLE c01_replica;
On every replica
Make sure pgwrh
extension is installed.
Configure connection to master server
Call configure_controller
function providing username and password of this replica account created on master.
SELECT configure_controller(
host => 'master.myorg',
port => '5432',
username => 'cr01r01', -- same as above
password => 'c01r01Password' -- same as above
);
Create and deploy replica cluster configuration
Specify what tables to replicate
Example below would configure distribution of every partition of test.my_data
to half (50%) of replicas,
except partitions of test.my_data_2024
which will be copied to all (100%) replicas.
WITH st(schema_name, table_name, replication_factory) AS (
VALUES
('test', 'my_data', 50),
('test', 'my_data_2024', 100)
)
INSERT INTO pgwrh.sharded_table (replication_group_id, sharded_table_schema, sharded_table_name, replication_factor)
SELECT
'c01', schema_name, table_name, replication_factor
FROM
st;
Configure replicas
Add replica to configuration:
SELECT pgwrh.add_replica('c01', 'c01r01', 'replica01.cluster01.myorg', 5432);
Start deployment
SELECT pgwrh.start_rollout('c01');
New configuration is now visible to connected replicas which will start data replication.
Commit configuration
Once all replicas confirmed configuration changes, execute:
SELECT pgwrh.commit_rollout('c01');
(this will fail if some replicas are not reconfigured yet)
Add more replicas
CREATE USER c01r02 PASSWORD 'c01r02Password' REPLICATION IN ROLE c01_replica;
CREATE USER c01r03 PASSWORD 'c01r03Password' REPLICATION IN ROLE c01_replica;
CREATE USER c01r04 PASSWORD 'c01r04Password' REPLICATION IN ROLE c01_replica;
select pgwrh.add_replica(
_replication_group_id := 'c01',
_host_id := 'c01r02',
_host_name := 'replica02.cluster01.myorg',
_port := 5432);
select pgwrh.add_replica(
_replication_group_id := 'c01',
_host_id := 'c01r03',
_host_name := 'replica03.cluster01.myorg',
_port := 5432,
_weight := 70);
select pgwrh.add_replica(
_replication_group_id := 'c01',
_host_id := 'c01r04',
_host_name := 'replica04.cluster01.myorg',
_port := 5432);
It is possible to adjust the number of shards assigned to replicas by setting replica weight:
SELECT pgwrh.set_replica_weight('c01', 'c01r04', 200);
To deploy new configuration:
SELECT pgwrh.start_rollout('c01');
And then:
SELECT pgwrh.commit_rollout('c01');