Redis Software Active-Active databases use CRDT-based multi-primary replication across participating clusters. Each participating cluster hosts its own local database instance, while only data is replicated between clusters. Because shard-count changes affect throughput, memory distribution, replication behavior, and resharding load, Active-Active resizing should be planned as a CRDB-level change rather than a standard local database resize. This article explains how to scale shards with minimal service impact, including Prerequisites, Step-by-Step Instructions, Emergency-Only Guidance, and Troubleshooting.
Quick Fix
| Situation | What to Do |
|---|---|
| Shard count changed in one cluster but not another Active-Active participant | Use a CRDB-level update with crdb-cli crdb update --crdb-guid <CRDB-GUID> --default-db-config '{"shards_count":<N>}'. A local database-only change may not update all participating clusters. |
| Need more throughput with minimal client disruption | Increase shard count during a low-traffic window and monitor online resharding until it completes across all participating clusters. Clients continue using the same database endpoint. |
| Unsure whether max memory must also increase | Increasing shard count without increasing total memory is usually acceptable when the goal is throughput, not capacity. The same total memory is distributed across more shards. |
| Resharding causes latency or sync pressure | Check for large keys, hot keys, CPU pressure, memory headroom, network pressure, and other running actions. Avoid heavy jobs or bulk deletes during resharding. |
| Need emergency scaling during a traffic spike | Use the local-cluster-first path only as a temporary emergency workaround, then align the global CRDB configuration later with crdb-cli. |
| Need to reduce shard count | Do not attempt online shard reduction for a CRDB. Plan a supported downtime procedure with Redis Support or migrate to a new database with the target shard count. |
Prerequisites
Confirm administrative access.
You need cluster admin access on all participating Redis Software clusters and access to crdb-cli.
Validate Active-Active health first.
Do not resize if any participant, shard, replica, or CRDB link is unhealthy.
Confirm no other actions are running.
Check for active database, cluster, or CRDB tasks before starting the resize.
rladmin cluster running_actions
crdb-cli task list
Confirm resource headroom.
Verify CPU, RAM, provisional RAM, and network headroom across all participating clusters before adding shards.
Confirm backups are available.
Recent backups should exist and be restorable before making a topology-impacting change.
Identify large or hot keys.
Large keys and hot keys can increase resharding time and latency during the operation.
redis-cli --bigkeys
redis-cli --hotkeys
Schedule during lower traffic.
Online resharding is designed to reduce disruption, but it still consumes CPU and network resources.
Step-by-Step Instructions
1. Decide whether you are scaling throughput, capacity, or both
Use shard scaling for throughput.
Adding shards increases parallelism and can help when ops/sec rises, latency increases, or existing shards are saturated.
Use memory scaling for capacity.
If the database is approaching its memory limit, increase memory_size separately or as part of the same CRDB update.
Do not assume more shards means more total memory.
If you increase only shards_count, the same total database memory is distributed across more shards.
2. Confirm the database is Active-Active and capture the CRDB GUID
List CRDBs:
crdb-cli crdb list
Record the CRDB-GUID for the database you want to resize.
3. Review the current CRDB configuration
crdb-cli crdb get --crdb-guid <CRDB-GUID>
Record:
Current shards_count
Current memory_size
Participating clusters
Module versions
Redis Software versions
4. Validate cluster health on each participating cluster
Run:
rladmin status extra all
rladmin cluster running_actions
Do not continue if:
A shard is down, stale, or recovering.
A cluster action is already running.
A CRDB task is already in progress.
The cluster lacks enough provisional RAM for the new shard layout.
5. Check for large or hot keys before resizing
Run sampling checks before the resize window:
redis-cli --bigkeys
redis-cli --hotkeys
For Redis 7.4 and later, --keystats may also help identify key distribution issues before resizing.
If you find oversized keys, reduce their impact before the resize by splitting data structures, adjusting application behavior, or using gradual cleanup patterns. Avoid large blocking DEL operations before or during CRDB resharding. For Active-Active databases, do not assume UNLINK removes all delete-related application impact because delete operations still require CRDB coordination.
6. Update shard count at the CRDB level
For Active-Active databases, shard count should be changed at the CRDB level.
crdb-cli crdb update --crdb-guid <CRDB-GUID> --default-db-config '{"shards_count":20}'
If you also need to increase memory:
crdb-cli crdb update --crdb-guid <CRDB-GUID> \
--default-db-config '{"memory_size":8589934592,"shards_count":20}'If only memory needs to change, crdb-cli also supports the dedicated --memory-size flag:
crdb-cli crdb update --crdb-guid <CRDB-GUID> --memory-size 8GB7. Use the UI carefully
In some Redis Software versions and workflows, local UI changes may affect only the local database instance rather than the full CRDB configuration. This can leave one participant resized while another participant keeps the previous shard count.
Use crdb-cli crdb update as the safest source of truth for global Active-Active shard-count changes.
8. Monitor the CRDB task until completion
crdb-cli task list
crdb-cli task status --task-id <TASK-ID>
Do not start other maintenance, topology changes, imports, or heavy cleanup jobs until the resize completes across the CRDB.
9. Validate health after the resize
Run:
rladmin status extra all
crdb-cli crdb health-report --crdb-guid <CRDB-GUID>
Confirm:
All participating clusters report healthy shards.
CRDB links are connected.
Sync lag is returning to normal.
Application latency has stabilized.
Clients continue using the same endpoint.
Emergency-Only Procedure
The recommended path is to resize at the CRDB level during a planned low-traffic window.
Use this local-cluster-first path only if traffic is already spiking and immediate action is required:
Increase shards on the production cluster through the admin UI.
Wait for local resharding to complete.
Increase shards on the other participating cluster through the admin UI.
Wait for local resharding to complete.
After traffic stabilizes, run the CRDB-level update so global CRDB metadata matches the actual shard layout.
crdb-cli crdb update --crdb-guid <CRDB-GUID> --default-db-config '{"shards_count":<N>}'
Do not add CRDB instances or make topology changes before the CRDB-level alignment is complete.
Troubleshooting
Shard count changed in one cluster but not another participant
Likely cause:
The change was made locally instead of at the CRDB level.
Resolution:
Run a CRDB-level update:
crdb-cli crdb update --crdb-guid <CRDB-GUID> --default-db-config '{"shards_count":<N>}'
Then monitor the CRDB task and verify global CRDB health.
Resharding is taking longer than expected
Likely cause:
Large key count, oversized keys, hot keys, high CPU load, or concurrent maintenance activity can slow resharding.
Resolution:
Check for large and hot keys.
Avoid imports, exports, backups, and bulk deletes during resharding.
Schedule the resize during lower traffic.
Add shards earlier next time if the database is already near saturation.
Not Enough Memory in the Cluster
Likely cause:
The cluster does not have enough provisional RAM or node headroom for the new shard layout.
Resolution:
Check cluster capacity with rladmin status extra all.
Free capacity, rebalance, or add node resources.
If capacity is also the goal, include memory_size in the CRDB update.
CRDB database is currently busy
Likely cause:
Another CRDB, database, or cluster action is already running.
Resolution:
rladmin cluster running_actions
crdb-cli task list
Wait for the active task to complete. If the task appears stalled, collect diagnostics and contact Redis Support.
Sync does not recover after resizing
Likely cause:
The resize may have exposed connectivity, TLS, version, module, or backlog issues.
Resolution:
Confirm all participants are healthy.
Validate inter-cluster connectivity.
Check TLS configuration if CRDB links are disconnected.
Confirm Redis Software and module versions are aligned.
Use the Redis KB article for Active-Active sync failures if sync does not recover cleanly.
Delete-related latency increased during the resize
Likely cause:
Large DEL operations are blocking. In CRDB, delete operations can still create cross-region coordination impact.
Resolution:
Avoid bulk deletes before or during shard scaling.
Use incremental cleanup patterns.
Prefer TTL-based expiration where possible.
Review Redis KB guidance for large key deletion and CRDB delete latency before combining cleanup with shard scaling.
Need to reduce shard count after the peak
Likely cause:
CRDB scale-out is online, but shard reduction is not a simple online reverse operation.
Resolution:
Do not attempt unsupported online shard reduction. Plan a downtime-backed procedure with Redis Support or create a new database with the desired shard count and migrate data.
0 comments
Please sign in to leave a comment.