Safely Take a Node Offline for Upgrade or Patching – Redis Knowledge Base

Here are step-by-step instructions for safely taking a Redis node offline for operating system patching or upgrading. By using maintenance mode, administrators can prevent data loss, avoid service disruption, and maintain high availability across the cluster.

This guide covers both the CLI and REST API methods for entering and exiting maintenance mode, managing the cluster master role, and validating node recovery and cluster health after maintenance.

Prerequisites

You must have cluster admin privileges.
Confirm that the cluster status is healthy:
- All nodes and shards show as OK in rladmin status.
Review patching or upgrade plans and ensure no other nodes are in maintenance mode.
Recommended: Take a backup or snapshot of the system before starting.

If the node hosts Active-Active (CRDB) databases, note any extra steps for traffic draining and replication validation.
If persistence is enabled, verify RDB/AOF sync status before entering maintenance.
Check client connection distribution to avoid overloading remaining nodes after shard migration.
Try running rladmin rebalance before maintenance to ensure shards are evenly placed.

Important: Only take one node offline at a time. Before proceeding, confirm the cluster has enough remaining healthy nodes to maintain quorum and continue shard availability.

Step-by-Step Procedure

1. Enable Maintenance Mode on the Target Node

Prepare the node for safe removal by migrating all shards and endpoints.

Using CLI

rladmin node <node_id> maintenance_mode on

Using REST API

curl -k -u <EMAIL>:<PASSWORD> \
 -H "Content-Type: application/json" \
 -X POST \
 -d '{"maintenance_mode":true}' \
 https://<cluster_endpoint>:9443/v1/nodes/<node_id>

2. Migrate the Cluster’s Master Role (If Required)

If the node being taken offline is the cluster master, move the role to a different node first.

rladmin cluster master set <other_node_id>

3. Validate Shard and Endpoint Migration

Confirm the target node has been drained of its responsibilities.

rladmin status

The node must not hold any shards or endpoints and must not be the cluster master.

Do not continue with rebooting or patching until the node no longer owns shards, endpoints, or the cluster master role.

4. Shut Down or Patch the Node

Perform your OS patch, hardware replacement, or upgrade tasks. If applicable, reboot the node after the changes are applied.
Warning: Rebooting multiple Redis cluster nodes simultaneously may cause quorum loss or service interruption. Always complete maintenance and recovery validation on one node before proceeding to the next.

5. Remove Maintenance Mode

Once the node is back online and stable:

rladmin node <node_id> maintenance_mode off

The node will automatically rejoin the cluster and sync its configuration.

6. Verify Cluster Health

Ensure everything is restored correctly:

rladmin status

Check that:

Node status is OK
All shards and endpoints are balanced
No errors or pending maintenance alerts

Related to