This article provides an overview of the recovery process for a Redis Enterprise Software cluster and its databases after a failure. It outlines the general steps for Cluster Recovery, Database Recovery, Node Maintenance, and includes a Troubleshooting Table to address common recovery challenges. While this article summarizes the process, detailed step-by-step instructions are available in the Redis documentation:
Note: The recovery process described here assumes you are recovering to new, clean nodes with clean persistent storage. If you reuse original nodes or drives, make sure to follow the additional steps in the official documentation to back up and prepare the environment.
Prerequisites
A recent Redis cluster configuration backup (
/ccs/ccs-redis.rdb)Accessible persistence files (AOF/RDB or backups) for each database
Target nodes with no running Redis processes and matching software version
Cluster name and FQDN unchanged from original installation
Clean persistent storage drives for the new cluster nodes.
Warning: During recovery or planned maintenance operations, never place a majority of cluster nodes into Maintenance Mode or reboot them simultaneously. Doing so can cause quorum loss, database unavailability, or potential data loss.
Cluster Recovery
Install Redis Software on the clean nodes.
-
Mount persistent storage containing:
The cluster configuration file (
/ccs/ccs-redis.rdb)All database persistence files (AOF/RDB/backups, typically under
/var/opt/redislabs/persist/)
Recover the cluster configuration on the first node using
rladmin cluster recover.Join the remaining nodes back into the cluster with
rladmin cluster join.Verify the cluster is healthy and update DNS records if necessary.
Full instructions: Recover a failed cluster
Database Recovery
Confirm that the cluster is healthy (
rladmin status).Ensure database persistence files are accessible and correctly mounted.
-
Recover databases using the
rladmin recovercommands:rladmin recover all– recover all databasesrladmin recover db db:<id>– recover a single database by IDrladmin recover db <name>– recover a single database by namerladmin recover db <name> only_configuration– recover configuration only
If databases use modules, make sure the required module versions are installed before recovery.
Verify that databases are active and clients can connect.
Full instructions: Recover a failed database.
Node Maintenance and OS Patching
For safely patching or rebooting cluster nodes:
Enable maintenance mode:
rladmin node <id> maintenance_mode onPerform patching or reboot.
Disable maintenance mode:
rladmin node <id> maintenance_mode off
Full instructions: Maintenance mode for cluster nodes
Important: Planned OS patching and node reboots should always be performed one node at a time. Confirm the cluster remains healthy and maintains quorum before proceeding to the next node.
Troubleshooting Scenarios
Symptom |
Possible Cause |
Recommended Fix |
|---|---|---|
Cluster doesn’t recover |
Missing or mismatched |
Verify you are using the correct cluster configuration file from persistent storage. Recheck recovery file paths and versions. |
Databases stuck in “pending recovery” or not showing in |
Persistence files missing or not mounted |
Ensure persistence files are mounted on all nodes. Set the recovery path with |
Database marked as “missing files” |
Persistence files missing, misplaced, or inaccessible |
Confirm the recovery path is set correctly on all nodes. Move files to the correct location if misplaced. Ensure file ownership/permissions are |
Database marked as “missing files” |
Corrupted persistence files |
Replace corrupted files with valid copies. If none are available, contact Redis Support. |
|
Database restore skipped |
Restore manually with |
Port 53 errors |
DNS server conflict |
Ensure PowerDNS is bound and available. See DNS configuration guidance. |
High CPU, RAM, or network usage after recovery |
Shard migration or rebalancing in progress |
Monitor until load stabilizes. If resource usage remains high, contact Redis Support. |
Helpful Commands
Cluster status:
rladmin status extra allErrors only:
rladmin status extra all errors_onlySupervisor check:
supervisorctl statusNode diagnostics:
rlcheck
0 comments
Please sign in to leave a comment.