Split-brain conditions in Redis can result from improperly executed or incomplete node maintenance. This article explains how to avoid these risks by following proper Cluster Maintenance Mode procedures, performing Step-by-Step Node Maintenance, and addressing Troubleshooting Common Issues. It includes guidance on Understanding Split-Brain and provides Key URLs.
Understanding Split-Brain
A split-brain state occurs when portions of a Redis cluster become partitioned and operate independently, risking data inconsistency due to divergent writes. Redis is resilient to isolated node failures but not large-scale network splits.
Cluster Maintenance Mode
Redis uses maintenance mode to ensure safe node servicing:
- Shards are migrated off the node in maintenance.
- Maintenance mode blocks activation if it risks quorum.
- Use
demote_nodeif the target node is a master.
Step-by-Step Node Maintenance
1. Evaluate Cluster Health and Capacity
- Confirm remaining nodes can host migrated shards.
- Use
rladmin statusandrlcheckfor system state.
2. Demote Master Node (If Applicable)
rladmin cluster master set <node_id> # or rladmin node <node_id> maintenance_mode on demote_node
3. Enable Maintenance Mode
rladmin node <node_id> maintenance_mode on overwrite_snapshot
4. Verify Migration
rladmin status extra all
Ensure shards and endpoints are successfully migrated.
5. Perform Maintenance
- Proceed only after confirmation of step 4.
6. Disable Maintenance Mode
rladmin node <node_id> maintenance_mode off
7. Verify Cluster Health
rladmin status
Troubleshooting Common Issues
| Issue | Cause | Resolution |
|---|---|---|
| Maintenance mode fails to enable | Insufficient resources, pending migrations | Validate node capacity and quorum |
| Unplanned maintenance | Operational gaps | Always schedule maintenance and notify peers |
| Shard imbalance post-maintenance | Auto-rebalancing incomplete | Use rladmin verify balance and rebalance as needed |
| Quorum loss and split-brain risk | Too many nodes offline | Open a ticket with Redis Support |
Summary Table: Common Commands
| Command | Purpose |
rladmin node <node_id> maintenance_mode on [options] |
Enable maintenance mode |
rladmin node <node_id> maintenance_mode off |
Disable maintenance mode |
rladmin verify balance |
Verify shard rebalance |
rladmin status |
Check node/cluster status |
curl -s -k -u $username:$password https://FQDN:9443/v1/cluster/check |
Cluster health check via API |
Reminder: Strictly follow Redis maintenance protocols to avoid split-brain and ensure uninterrupted cluster operation.
0 comments
Please sign in to leave a comment.