Redis Software Active-Active databases (CRDBs) use the syncer process and inter-cluster replication to maintain data consistency across participating clusters. Each cluster hosts its own database while only the data is replicated, so configuration, module versions, and certificates must remain aligned across all participants.
After events such as Redis Software upgrades, disaster recovery drills, cluster reboots, certificate rotations, or participant re-creation, CRDB synchronization can stop progressing even when databases appear healthy. This guide helps you quickly identify the root cause and apply the correct fix using scenario-based navigation, Quick Fix actions, and targeted recovery steps.
Start Here: Identify Your Scenario
Use this to jump directly to the most relevant fix:
After a Redis Software upgrade → Go to Verify version alignment
After a cluster reboot or restart → Go to Restart sync safely
After DR rebuild or participant re-creation → Go to Fix DR configuration drift
After certificate or mTLS change → Go to Resolve certificate issues
Links disconnected or health-report hangs → Go to Validate connectivity
Quick Fix
| What you see | Fast check | Immediate action |
|---|---|---|
| Sync stopped after reboot or restart | Check shard and database health | Run: crdb-cli crdb update --crdb-guid <CRDB_GUID> --force
|
| Participant disconnected or health-report hangs | Test connectivity to the endpoints using redis-cli
|
Fix DNS, routing, firewall, or TLS issues |
| Sync broke after certificate change | Compare certificates across clusters | Run: crdb-cli crdb update --crdb-guid <CRDB_GUID> --force if the proxy or syncer certificate has been updated |
| DR participant recreated with different version/settings | Compare database vs. CRDB config | Update CRDB metadata and align database |
| Syncer logs show certificate mismatch | Check the trust chain and make sure the endpoint is using the configured certificate | Fix certificates, then refresh CRDB. If the certificate is already correct, contact Redis support. |
Prerequisites
Administrative access to all participating clusters
Access to
crdb-cli,rladmin,openssl, and REST APICRDB GUID and database IDs
Maintenance window for disruptive actions
Important:
Inter-cluster connectivity must be functional before sync can recover
Full sync operations can generate significant traffic
Certificate updates must be consistent across all clusters
Avoid repeated forced sync attempts before identifying root cause
Fix Connectivity Issues Between CRDB Participants
When to use:
health-report hangs
Links show disconnected
One participant appears unreachable
Run:
openssl s_client -connect <endpoint>:<port> -servername <endpoint>
Optional:
openssl s_client -connect <endpoint>:<port> -servername <endpoint> -showcerts
Interpret results:
Connection failure → check firewall, routing, DNS
No listener → validate endpoint and port
Certificate mismatch → TLS trust issue or, particularly for Kubernetes or if a proxy is in use, make sure that your route/ingress is configured for TLS passthrough. Replication will not work if your route/ingress is presenting a different certificate than the cluster.
If using a load balancer:
Confirm correct backend mapping
Confirm that TLS passthrough is enabled
Ensure CRDB ports are exposed
Verify Version Alignment Across Clusters
When to use:
Issue started after upgrade
Run:
rladmin status extra all
Expected: All clusters run the same Redis Software version
If not:
Complete upgrade across all clusters
Do not attempt CRDB fixes until aligned
Then run:
crdb-cli crdb update --crdb-guid <CRDB_GUID> --update-db-config-modules true
Confirm Cluster and Database Health
Run:
rladmin status extra all
rladmin cluster running_actions
Fix before continuing if:
Shards are down or recovering
The database reports an active or stuck change in progress
Actions are stuck
CRDB sync requires healthy local databases.
Resolve Certificate or mTLS Issues
When to use:
Sync failed after certificate rotation
mTLS enabled or changed
Syncer logs show certificate errors
Steps:
Update certificates using supported methods (UI,
rladmin, or API)Ensure all clusters trust the same CA
Verify SAN matches endpoint
Then use the following command to sync the new certificates into the CRDB config:
crdb-cli crdb update --crdb-guid <CRDB_GUID> --force
Restart Sync Safely
When to use:
Sync stopped after reboot or failure
Option A:
curl -k -u <user>:<pass> -X PUT \
-H "Content-Type: application/json" \
-d '{"sync":"enabled"}' \
https://<host>:<port>/v1/bdbs/<database-id>
Option B:
crdb-cli crdb update --crdb-guid <CRDB_GUID> --force
Only perform after resolving the root cause.
Fix DR Configuration Drift
When to use:
Participant was deleted and recreated
Database version or modules do not match
Check the database configuration and CRDB configuration to see if they match (Redis compatibility version, number of shards, size, modules, etc.):
crdb-cli crdb get --crdb-guid <CRDB-GUID>
Update CRDB metadata:
crdb-cli crdb update --crdb-guid <CRDB_GUID> \
--default-db-config '{"version":"7.4"}'
If modules differ:
crdb-cli crdb update --crdb-guid <CRDB_GUID> \
--default-db-config '{"module_list":[{"module_name":"search","semantic_version":"2.10.18","module_args":"PARTITIONS AUTO"}],"version":"7.4"}'
Then align the database locally if needed.
Evaluate Replication Backlog Impact
If sync was down for an extended period:
Expect higher bandwidth and longer recovery
Avoid repeated retries while unstable.
Validate Recovery
Run:
crdb-cli crdb health-report --crdb-guid <CRDB_GUID>
rladmin status extra all
Expected result:
All links connected
No shard errors
Sync lag decreasing
No TLS or connectivity errors in logs
Common Mistakes to Avoid
Repeatedly running
--forcewithout fixing root causeUpgrading only some clusters
Applying certificate changes without running a crdb update
Recreating DR participants without validating configuration
What to Expect During Recovery
Partial sync → fast recovery
Full sync → longer recovery, higher bandwidth
Cross-region sync → slower convergence
When to Contact Support
If the issue persists, collect:
crdb-cli crdb health-reportoutputrladmin status extra allrladmin cluster running_actionsSyncer and proxy logs
Support packages from all clusters
Timeline of triggering event
Notes
Active-Active requires both connectivity and configuration consistency
Certificate and metadata drift are the most common causes after upgrade and DR events
Use this article as the primary recovery flow before escalating
0 comments
Please sign in to leave a comment.