Node Maintenance and OS Patching in Redis Software enables system administrators to safely perform planned updates without disrupting cluster health. This article explains When and Why to Use Maintenance Mode, Steps for Entering and Exiting Maintenance Mode, key Considerations for Shard Migration and Node Limits, the Safe Rebooting and Patching Process, and includes Command Guidance for executing maintenance procedures effectively.
When & Why to Use Maintenance Mode
Used for planned node operations like OS patching or hardware upgrades
- Migrates all shards and endpoints off the targeted node
- Prevents the node from accepting new shard assignments during maintenance
- Ensures quorum is maintained (never enable on a majority of nodes simultaneously)
- Automates configuration snapshot and shard migration, offering safer alternatives to manual shutdowns
Steps for Putting a Node in Maintenance Mode
Prepare for maintenance
- Notify stakeholders and schedule a maintenance window
- Verify current cluster health and make sure recent backups exist
- Confirm that other nodes have sufficient resources for shard migration
- Verify that enough healthy nodes remain online to maintain cluster quorum during maintenance operations.
Check cluster status
- Run
rladmin statusto confirm all nodes are healthy and the cluster is stable
Demote the master node if applicable
-
If the node is the master, use:
rladmin node <node_id> maintenance_mode on demote_nodeThis is recommended before rebooting a node hosting the cluster master role to minimize failover impact and avoid unexpected leadership changes during maintenance.
- Or manually reassign the master and enable maintenance mode on the old master
Activate Maintenance Mode
-
To migrate all shards and take a snapshot:
rladmin node <node_id> maintenance_mode on overwrite_snapshot -
To prevent migration of replica shards:
rladmin node <node_id> maintenance_mode on evict_ha_replica disabled evict_active_active_replica disabled - Only one node should be in maintenance mode at a time
Verify Maintenance Mode activation
- Run
rladmin status - Shards on the node should show yellow, indicating the node is in Maintenance Mode
Considerations for Shard Migration & Node Limits
Resource availability and quorum safety must be verified
- Ensure remaining nodes can handle the additional load
- Place only one node in Maintenance Mode at a time to maintain quorum
- Avoid entering Maintenance Mode on more than 50% of nodes simultaneously
- For Active-Active replication setups, verify sync health before and after maintenance
- Contact Redis Support before performing major maintenance in production environments
Safe Rebooting & Patching Process
Apply OS or hardware patches after verifying Maintenance Mode is active
- Reboot the node or apply updates
- Wait for the node to come back online and stabilize
- Repeat the same process for each subsequent node, one at a time
- Do not reboot or patch multiple nodes simultaneously unless explicitly validated for your topology and quorum requirements.
Steps for Taking a Node Out of Maintenance Mode
Disable Maintenance Mode after maintenance is complete
-
Run:
rladmin node <node_id> maintenance_mode off -
To restore a specific snapshot:
rladmin node <node_id> maintenance_mode off snapshot_name <snapshot_name> -
To skip restoring shards:
rladmin node <node_id> maintenance_mode off skip_shards_restore
Verify shard rebalancing
- Use
rladmin statusto ensure the node is out of Maintenance Mode and shards have rebalanced properly
Command Guidance
Enable Maintenance Mode
rladmin node <node_id> maintenance_mode on overwrite_snapshotDemote master node on entering Maintenance Mode
rladmin node <node_id> maintenance_mode on demote_nodePrevent replica migration
rladmin node <node_id> maintenance_mode on evict_ha_replica disabled evict_active_active_replica disabledDisable Maintenance Mode
rladmin node <node_id> maintenance_mode offList available snapshots
rladmin node <node_id> snapshot list*REST API endpoints can also be used to enable or disable Maintenance Mode.
Warning: Never place more than 50% of cluster nodes into Maintenance Mode simultaneously. Loss of quorum can lead to cluster instability, failed failovers, or data loss.
Additional Resources
- Maintenance Mode for Cluster Nodes
- rladmin node maintenance_mode – CLI Reference
- Safely Take a Node Offline for Upgrade or Patching
Always contact Redis Support for guidance before performing complex or production-impacting maintenance operations.
0 comments
Please sign in to leave a comment.