How to Scale Shards in an Active-Active Database with Minimal Service Impact – Redis Knowledge Base

Redis Software Active-Active databases use CRDT-based multi-primary replication across participating clusters. Each participating cluster hosts its own local database instance, while only data is replicated between clusters. Because shard-count changes affect throughput, memory distribution, replication behavior, and resharding load, Active-Active resizing should be planned as a CRDB-level change rather than a standard local database resize. This article explains how to scale shards with minimal service impact, including Prerequisites, Step-by-Step Instructions, Emergency-Only Guidance, and Troubleshooting.

Quick Fix

Situation	What to Do
Shard count changed in one cluster but not another Active-Active participant	Use a CRDB-level update with `crdb-cli crdb update --crdb-guid <CRDB-GUID> --default-db-config '{"shards_count":<N>}'`. A local database-only change may not update all participating clusters.
Need more throughput with minimal client disruption	Increase shard count during a low-traffic window and monitor online resharding until it completes across all participating clusters. Clients continue using the same database endpoint.
Unsure whether max memory must also increase	Increasing shard count without increasing total memory is usually acceptable when the goal is throughput, not capacity. The same total memory is distributed across more shards.
Resharding causes latency or sync pressure	Check for large keys, hot keys, CPU pressure, memory headroom, network pressure, and other running actions. Avoid heavy jobs or bulk deletes during resharding.
Need emergency scaling during a traffic spike	Use the local-cluster-first path only as a temporary emergency workaround, then align the global CRDB configuration later with `crdb-cli`.
Need to reduce shard count	Do not attempt online shard reduction for a CRDB. Plan a supported downtime procedure with Redis Support or migrate to a new database with the target shard count.

Prerequisites

Confirm administrative access.
You need cluster admin access on all participating Redis Software clusters and access to crdb-cli.

Validate Active-Active health first.
Do not resize if any participant, shard, replica, or CRDB link is unhealthy.

Confirm no other actions are running.
Check for active database, cluster, or CRDB tasks before starting the resize.

rladmin cluster running_actions
crdb-cli task list

Confirm resource headroom.
Verify CPU, RAM, provisional RAM, and network headroom across all participating clusters before adding shards.

Confirm backups are available.
Recent backups should exist and be restorable before making a topology-impacting change.

Identify large or hot keys.
Large keys and hot keys can increase resharding time and latency during the operation.

redis-cli --bigkeys
redis-cli --hotkeys

Schedule during lower traffic.
Online resharding is designed to reduce disruption, but it still consumes CPU and network resources.

Step-by-Step Instructions

1. Decide whether you are scaling throughput, capacity, or both

Use shard scaling for throughput.
Adding shards increases parallelism and can help when ops/sec rises, latency increases, or existing shards are saturated.

Use memory scaling for capacity.
If the database is approaching its memory limit, increase memory_size separately or as part of the same CRDB update.

Do not assume more shards means more total memory.
If you increase only shards_count, the same total database memory is distributed across more shards.

2. Confirm the database is Active-Active and capture the CRDB GUID

List CRDBs:

crdb-cli crdb list

Record the CRDB-GUID for the database you want to resize.

3. Review the current CRDB configuration

crdb-cli crdb get --crdb-guid <CRDB-GUID>

Record:

Current shards_count
Current memory_size
Participating clusters
Module versions
Redis Software versions

4. Validate cluster health on each participating cluster

Run:

rladmin status extra all
rladmin cluster running_actions

Do not continue if:

A shard is down, stale, or recovering.
A cluster action is already running.
A CRDB task is already in progress.
The cluster lacks enough provisional RAM for the new shard layout.

5. Check for large or hot keys before resizing

Run sampling checks before the resize window:

redis-cli --bigkeys
redis-cli --hotkeys

For Redis 7.4 and later, --keystats may also help identify key distribution issues before resizing.

If you find oversized keys, reduce their impact before the resize by splitting data structures, adjusting application behavior, or using gradual cleanup patterns. Avoid large blocking DEL operations before or during CRDB resharding. For Active-Active databases, do not assume UNLINK removes all delete-related application impact because delete operations still require CRDB coordination.

6. Update shard count at the CRDB level

For Active-Active databases, shard count should be changed at the CRDB level.

crdb-cli crdb update --crdb-guid <CRDB-GUID> --default-db-config '{"shards_count":20}'

If you also need to increase memory:

crdb-cli crdb update --crdb-guid <CRDB-GUID> \
  --default-db-config '{"memory_size":8589934592,"shards_count":20}'

If only memory needs to change, crdb-cli also supports the dedicated --memory-size flag:

crdb-cli crdb update --crdb-guid &lt;CRDB-GUID&gt; --memory-size 8GB

7. Use the UI carefully

In some Redis Software versions and workflows, local UI changes may affect only the local database instance rather than the full CRDB configuration. This can leave one participant resized while another participant keeps the previous shard count.

Use crdb-cli crdb update as the safest source of truth for global Active-Active shard-count changes.

8. Monitor the CRDB task until completion

crdb-cli task list
crdb-cli task status --task-id <TASK-ID>

Do not start other maintenance, topology changes, imports, or heavy cleanup jobs until the resize completes across the CRDB.

9. Validate health after the resize

Run:

rladmin status extra all
crdb-cli crdb health-report --crdb-guid <CRDB-GUID>

Confirm:

All participating clusters report healthy shards.
CRDB links are connected.
Sync lag is returning to normal.
Application latency has stabilized.
Clients continue using the same endpoint.

Emergency-Only Procedure

The recommended path is to resize at the CRDB level during a planned low-traffic window.

Use this local-cluster-first path only if traffic is already spiking and immediate action is required:

Increase shards on the production cluster through the admin UI.
Wait for local resharding to complete.
Increase shards on the other participating cluster through the admin UI.
Wait for local resharding to complete.
After traffic stabilizes, run the CRDB-level update so global CRDB metadata matches the actual shard layout.

crdb-cli crdb update --crdb-guid <CRDB-GUID> --default-db-config '{"shards_count":<N>}'

Do not add CRDB instances or make topology changes before the CRDB-level alignment is complete.

Troubleshooting

Shard count changed in one cluster but not another participant

Likely cause:
The change was made locally instead of at the CRDB level.

Resolution:
Run a CRDB-level update:

crdb-cli crdb update --crdb-guid <CRDB-GUID> --default-db-config '{"shards_count":<N>}'

Then monitor the CRDB task and verify global CRDB health.

Resharding is taking longer than expected

Likely cause:
Large key count, oversized keys, hot keys, high CPU load, or concurrent maintenance activity can slow resharding.

Resolution:

Check for large and hot keys.
Avoid imports, exports, backups, and bulk deletes during resharding.
Schedule the resize during lower traffic.
Add shards earlier next time if the database is already near saturation.

Not Enough Memory in the Cluster

Likely cause:
The cluster does not have enough provisional RAM or node headroom for the new shard layout.

Resolution:

Check cluster capacity with rladmin status extra all.
Free capacity, rebalance, or add node resources.
If capacity is also the goal, include memory_size in the CRDB update.

CRDB database is currently busy

Likely cause:
Another CRDB, database, or cluster action is already running.

Resolution:

rladmin cluster running_actions
crdb-cli task list

Wait for the active task to complete. If the task appears stalled, collect diagnostics and contact Redis Support.

Sync does not recover after resizing

Likely cause:
The resize may have exposed connectivity, TLS, version, module, or backlog issues.

Resolution:

Confirm all participants are healthy.
Validate inter-cluster connectivity.
Check TLS configuration if CRDB links are disconnected.
Confirm Redis Software and module versions are aligned.
Use the Redis KB article for Active-Active sync failures if sync does not recover cleanly.

Delete-related latency increased during the resize

Likely cause:
Large DEL operations are blocking. In CRDB, delete operations can still create cross-region coordination impact.

Resolution:

Avoid bulk deletes before or during shard scaling.
Use incremental cleanup patterns.
Prefer TTL-based expiration where possible.
Review Redis KB guidance for large key deletion and CRDB delete latency before combining cleanup with shard scaling.

Need to reduce shard count after the peak

Likely cause:
CRDB scale-out is online, but shard reduction is not a simple online reverse operation.

Resolution:

Do not attempt unsupported online shard reduction. Plan a downtime-backed procedure with Redis Support or create a new database with the desired shard count and migrate data.

Related to

Quick Fix

Prerequisites

Step-by-Step Instructions

1. Decide whether you are scaling throughput, capacity, or both

2. Confirm the database is Active-Active and capture the CRDB GUID

3. Review the current CRDB configuration

4. Validate cluster health on each participating cluster

5. Check for large or hot keys before resizing

6. Update shard count at the CRDB level

7. Use the UI carefully

8. Monitor the CRDB task until completion

9. Validate health after the resize

Emergency-Only Procedure

Troubleshooting

Shard count changed in one cluster but not another participant

Resharding is taking longer than expected

Not Enough Memory in the Cluster

CRDB database is currently busy

Sync does not recover after resizing

Delete-related latency increased during the resize

Need to reduce shard count after the peak

Related articles