Here are step-by-step instructions for safely taking a Redis node offline for operating system patching or upgrading. By using maintenance mode, administrators can prevent data loss, avoid service disruption, and maintain high availability across the cluster.
This guide covers both the CLI and REST API methods for entering and exiting maintenance mode, managing the cluster master role, and validating node recovery and cluster health after maintenance.
Prerequisites
You must have cluster admin privileges.
-
Confirm that the cluster status is healthy:
All nodes and shards show as
OKinrladmin status.
Review patching or upgrade plans and ensure no other nodes are in maintenance mode.
Recommended: Take a backup or snapshot of the system before starting.
If the node hosts Active-Active (CRDB) databases, note any extra steps for traffic draining and replication validation.
If persistence is enabled, verify RDB/AOF sync status before entering maintenance.
Check client connection distribution to avoid overloading remaining nodes after shard migration.
Try running
rladmin rebalancebefore maintenance to ensure shards are evenly placed.
Important: Only take one node offline at a time. Before proceeding, confirm the cluster has enough remaining healthy nodes to maintain quorum and continue shard availability.
Step-by-Step Procedure
1. Enable Maintenance Mode on the Target Node
Prepare the node for safe removal by migrating all shards and endpoints.
Using CLI
rladmin host <node_id> maintenance_mode on
Using REST API
curl -k -u <EMAIL>:<PASSWORD> \
-H "Content-Type: application/json" \
-X POST \
-d '{"maintenance_mode":true}' \
https://<cluster_endpoint>:9443/v1/nodes/<node_id>2. Migrate the Cluster’s Master Role (If Required)
If the node being taken offline is the cluster master, move the role to a different node first.
rladmin cluster master set <other_node_id>
3. Validate Shard and Endpoint Migration
Confirm the target node has been drained of its responsibilities.
rladmin status
The node must not hold any shards or endpoints and must not be the cluster master.
Do not continue with rebooting or patching until the node no longer owns shards, endpoints, or the cluster master role.
4. Shut Down or Patch the Node
Perform your OS patch, hardware replacement, or upgrade tasks. If applicable, reboot the node after the changes are applied.
Warning: Rebooting multiple Redis cluster nodes simultaneously may cause quorum loss or service interruption. Always complete maintenance and recovery validation on one node before proceeding to the next.
5. Remove Maintenance Mode
Once the node is back online and stable:
rladmin host <node_id> maintenance_mode off
The node will automatically rejoin the cluster and sync its configuration.
6. Verify Cluster Health
Ensure everything is restored correctly:
rladmin status
Check that:
- Node status is OK
- All shards and endpoints are balanced
- No errors or pending maintenance alerts
0 comments
Please sign in to leave a comment.