Safely Resharding a Redis Software Database: Best Practices and Step-by-Step Guide – Redis Knowledge Base

Resharding in Redis Software lets you increase shard count to scale throughput and memory capacity without downtime. It’s a critical operation when dataset growth or key imbalance starts to impact performance. This guide explains how to prepare and execute a resharding operation safely, including Prerequisites and Safety Checks, Step-by-Step Instructions, Validation and Monitoring, and Troubleshooting for common issues such as CROSSSLOT errors, hot keys, and rack-aware reshard failures.

Quick answer

You can only increase the number of shards in place. To reduce shard count, create a new database with fewer shards and migrate data to it.
Before resharding, ensure CPU and memory utilization are safely below capacity on all nodes (ideally <80%) and schedule the operation during a maintenance window to absorb transient latency spikes.
Always back up the database and, when possible, rehearse the reshard in a non‑production environment first.
Prefer doubling the shard count (x2) where possible (for example, 2 → 4 → 8) to simplify slot redistribution, validation, and capacity planning.
When resharding rack‑aware databases, ensure replication is enabled

Prerequisites and Safety Checks

To minimize risk during resharding, review the following grouped requirements before proceeding:

Access and Environment Readiness

Admin Access: Use an account with administrator privileges on the Redis Cluster (REC) UI or the rladmin CLI.
Maintenance Window: Perform the operation during a scheduled maintenance window to account for short-lived latency spikes.
Monitoring Setup: Ensure Redis Insight, Prometheus, or Grafana are configured to track shard balance, latency, and throughput during the process.

Cluster Resource Validation
Maintain CPU and memory utilization below 80% across all nodes, with additional free memory for rebalancing.
Replication and Rack Awareness

Enable replication before resharding.

Rack-aware databases require replication for safe resharding. Each primary shard should have a replica in a different rack or zone.

If resharding fails in a rack-aware deployment:

Enable replication if not already enabled
Or temporarily disable rack awareness, complete the reshard, then re-enable it

If issues persist, collect a support package and contact Redis Support.

Data Protection and Testing

Backups: Back up your database before initiating the operation.
Staging Validation: Run a resharding test in a non-production environment to validate hashing and replication behavior.

Key and Hashing Policy Review

Multi-key Operations:
Use INFO COMMANDSTATS to identify multi-key commands (for example, MSET, MGET).
Ensure all related keys share the same hash tag (e.g., user:{123}:cart, user:{123}:profile) or use a custom hashing policy.

Large and hot keys

Large keys
- Run:
```
redis-cli -h <host> -p <port> --bigkeys
```
  to identify very large keys that may significantly slow or stall resharding (especially when copied or trimmed).
- Consider:
  - Breaking very large values into smaller chunks.
  - Refactoring large hashes/lists/streams into multiple keys.
Hot keys
- Run (if supported): redis-cli -h <host> -p <port> --hotkeys
  - This command samples frequently-accessed keys and helps identify hotspots.
  - If the command is unavailable or does not return results, use Redis Insight or contact Redis Support to profile hot keys.
- Address hot keys by:
  - Splitting or sharding hot data across multiple keys
    - Reducing per-key contention through caching or request distribution

Step-by-Step Resharding

Prepare Key Naming and Hashing
Review application key patterns and ensure multi-key operations use consistent hash tags or follow the defined hashing policy.
Initiate Resharding
- You can reshard either from the REC UI or via rladmin (for Redis Software clusters you administer directly).

Option A: Using the UI

Go to Databases in the Redis Enterprise UI.
Select the target database.
Open Configuration → Shards (or the equivalent “Shards / Reshard” panel).
Increase the number of shards to the desired value.
Apply the configuration change to start the resharding process.

Option B: Using the REST API (Recommended for automation)

Use the revamp action to safely update the database topology, including shard count.

Step 1: Dry run (recommended)

Validate the change before execution:

curl -k -u "<user>:<password>" \
  -X PUT "https://<cm-host>/v1/bdbs/<uid>/actions/revamp?dry_run=true" \
  -H "Content-Type: application/json" \
  -d '{
    "shards_count": <new_count>
  }'

Step 2: Execute resharding

curl -k -u "<user>:<password>" \
  -X PUT "https://<cm-host>/v1/bdbs/<uid>/actions/revamp" \
  -H "Content-Type: application/json" \
  -d '{
    "shards_count": <new_count>
  }'

<uid>: Database ID
<new_count>: Desired number of primary shards

Step 3: Track progress

If the request is accepted, the response returns an action_uid. Track progress using:

GET /v1/actions/<action_uid>

Notes

Resharding is an online operation; the database remains available, though temporary latency increases may occur.
Always perform a dry run before executing topology changes.
For large datasets, progress may take time depending on key distribution and system load.

Validation and Monitoring

Shard Balance: Confirm even utilization with rladmin status extra all or the REC UI.
Key Distribution: Re-run redis-cli --bigkeys or review Redis Insight analytics for uneven key sizes.
Performance Metrics: Compare latency, throughput, and memory usage before and after the operation.
Cluster Health: Run rlcheck to ensure all nodes and processes are healthy.

Quick Fix Table

Symptom	Likely Cause	Quick Resolution
CROSSSLOT errors	Multi-key operations span multiple hash slots	Use hash tags or adjust hashing policy.
Latency spikes	Large or hot keys during resharding	Split large keys, add RAM, or schedule off-peak.
Shard imbalance	Uneven key distribution	Re-run resharding or review hash tag configuration.
Reshard fails with rack awareness	Rack-aware DB missing prerequisites (often no replication)	Temporarily disable rack awareness, reshard, then re-enable.
Resharding stuck or slow	Node or process issue	Restart cnm_exec, check event_log.log or cnm_exec.log, verify resources.

If resharding repeatedly fails:

Inspect /var/opt/redislabs/log/ (event_log.log, redis-ID.log) for migration errors.
Check node resource usage (RAM, CPU, disk).
Collect a support package (rladmin cluster support) and contact Redis Support.

Best Practices

Avoid oversized keys: Keep keys <512 MB (ideally <300 MB) for better replication and migration performance.
Monitor hot keys: Regularly identify and refactor keys that overload a single shard.
Scale predictably: Increase shard counts in powers of two (2→4→8) for optimal distribution.
Test in staging: Validate the process in non-production environments first.
Schedule off-peak: Perform resharding during low-traffic hours.
Back up data: Always back up before any resharding activity.

References

For extended guidance, see
Database clustering in Redis Enterprise,
Performance tuning best practices, and
Troubleshooting distributed system issues in Redis Software.

Related to