Troubleshooting Active-Active (CRDB) Sync Failures After Upgrade or DR Events – Redis Knowledge Base

Redis Software Active-Active databases (CRDBs) use the syncer process and inter-cluster replication to maintain data consistency across participating clusters. Each cluster hosts its own database while only the data is replicated, so configuration, module versions, and certificates must remain aligned across all participants.

After events such as Redis Software upgrades, disaster recovery drills, cluster reboots, certificate rotations, or participant re-creation, CRDB synchronization can stop progressing even when databases appear healthy. This guide helps you quickly identify the root cause and apply the correct fix using scenario-based navigation, Quick Fix actions, and targeted recovery steps.

Start Here: Identify Your Scenario

Use this to jump directly to the most relevant fix:

After a Redis Software upgrade → Go to Verify version alignment
After a cluster reboot or restart → Go to Restart sync safely
After DR rebuild or participant re-creation → Go to Fix DR configuration drift
After certificate or mTLS change → Go to Resolve certificate issues
Links disconnected or health-report hangs → Go to Validate connectivity

Quick Fix

What you see	Fast check	Immediate action
Sync stopped after reboot or restart	Check shard and database health	Run: `crdb-cli crdb update --crdb-guid <CRDB_GUID> --force`
Participant disconnected or health-report hangs	Test connectivity to the endpoints using `redis-cli`	Fix DNS, routing, firewall, or TLS issues
Sync broke after certificate change	Compare certificates across clusters	Run: `crdb-cli crdb update --crdb-guid <CRDB_GUID> --force` if the proxy or syncer certificate has been updated
DR participant recreated with different version/settings	Compare database vs. CRDB config	Update CRDB metadata and align database
Syncer logs show certificate mismatch	Check the trust chain and make sure the endpoint is using the configured certificate	Fix certificates, then refresh CRDB. If the certificate is already correct, contact Redis support.

Prerequisites

Administrative access to all participating clusters
Access to crdb-cli, rladmin, openssl, and REST API
CRDB GUID and database IDs
Maintenance window for disruptive actions

Important:

Inter-cluster connectivity must be functional before sync can recover
Full sync operations can generate significant traffic
Certificate updates must be consistent across all clusters
Avoid repeated forced sync attempts before identifying root cause

Fix Connectivity Issues Between CRDB Participants

When to use:

health-report hangs
Links show disconnected
One participant appears unreachable

Run:

openssl s_client -connect <endpoint>:<port> -servername <endpoint>

Optional:

openssl s_client -connect <endpoint>:<port> -servername <endpoint> -showcerts

Interpret results:

Connection failure → check firewall, routing, DNS
No listener → validate endpoint and port
Certificate mismatch → TLS trust issue or, particularly for Kubernetes or if a proxy is in use, make sure that your route/ingress is configured for TLS passthrough. Replication will not work if your route/ingress is presenting a different certificate than the cluster.

If using a load balancer:

Confirm correct backend mapping
Confirm that TLS passthrough is enabled
Ensure CRDB ports are exposed

Verify Version Alignment Across Clusters

When to use:

Issue started after upgrade

Run:

rladmin status extra all

Expected: All clusters run the same Redis Software version

If not:

Complete upgrade across all clusters
Do not attempt CRDB fixes until aligned

Then run:

crdb-cli crdb update --crdb-guid <CRDB_GUID> --update-db-config-modules true

Confirm Cluster and Database Health

Run:

rladmin status extra all
rladmin cluster running_actions

Fix before continuing if:

Shards are down or recovering
The database reports an active or stuck change in progress
Actions are stuck

CRDB sync requires healthy local databases.

Resolve Certificate or mTLS Issues

When to use:

Sync failed after certificate rotation
mTLS enabled or changed
Syncer logs show certificate errors

Steps:

Update certificates using supported methods (UI, rladmin, or API)
Ensure all clusters trust the same CA
Verify SAN matches endpoint

Then use the following command to sync the new certificates into the CRDB config:

crdb-cli crdb update --crdb-guid <CRDB_GUID> --force

Restart Sync Safely

When to use:

Sync stopped after reboot or failure

Option A:

curl -k -u <user>:<pass> -X PUT \
  -H "Content-Type: application/json" \
  -d '{"sync":"enabled"}' \
  https://<host>:<port>/v1/bdbs/<database-id>

Option B:

crdb-cli crdb update --crdb-guid <CRDB_GUID> --force

Only perform after resolving the root cause.

Fix DR Configuration Drift

When to use:

Participant was deleted and recreated
Database version or modules do not match

Check the database configuration and CRDB configuration to see if they match (Redis compatibility version, number of shards, size, modules, etc.):

crdb-cli crdb get --crdb-guid <CRDB-GUID>

Update CRDB metadata:

crdb-cli crdb update --crdb-guid <CRDB_GUID> \
  --default-db-config '{"version":"7.4"}'

If modules differ:

crdb-cli crdb update --crdb-guid <CRDB_GUID> \
  --default-db-config '{"module_list":[{"module_name":"search","semantic_version":"2.10.18","module_args":"PARTITIONS AUTO"}],"version":"7.4"}'

Then align the database locally if needed.

Evaluate Replication Backlog Impact

If sync was down for an extended period:

Expect higher bandwidth and longer recovery

Avoid repeated retries while unstable.

Validate Recovery

Run:

crdb-cli crdb health-report --crdb-guid <CRDB_GUID>
rladmin status extra all

Expected result:

All links connected
No shard errors
Sync lag decreasing
No TLS or connectivity errors in logs

Common Mistakes to Avoid

Repeatedly running --force without fixing root cause
Upgrading only some clusters
Applying certificate changes without running a crdb update
Recreating DR participants without validating configuration

What to Expect During Recovery

Partial sync → fast recovery
Full sync → longer recovery, higher bandwidth
Cross-region sync → slower convergence

When to Contact Support

If the issue persists, collect:

crdb-cli crdb health-report output
rladmin status extra all
rladmin cluster running_actions
Syncer and proxy logs
Support packages from all clusters
Timeline of triggering event

Notes

Active-Active requires both connectivity and configuration consistency
Certificate and metadata drift are the most common causes after upgrade and DR events
Use this article as the primary recovery flow before escalating

Related to

Start Here: Identify Your Scenario

Quick Fix

Prerequisites

Fix Connectivity Issues Between CRDB Participants

Verify Version Alignment Across Clusters

Confirm Cluster and Database Health

Resolve Certificate or mTLS Issues

Restart Sync Safely

Fix DR Configuration Drift

Evaluate Replication Backlog Impact

Validate Recovery

Common Mistakes to Avoid

What to Expect During Recovery

When to Contact Support

Notes

Related articles