Redis Enterprise Cluster Lost Quorum After Upgrade (Kubernetes/OpenShift) – Redis Knowledge Base

Redis Enterprise for Kubernetes relies on quorum to maintain cluster leadership and management operations. During upgrades (Operator, Kubernetes, or OpenShift), quorum can be lost if a majority of nodes become unavailable or out of sync. When this happens, the cluster cannot elect a master and remains degraded. This article explains how to identify quorum loss, safely recover the cluster, and validate system health, along with troubleshooting and prevention guidance.

Quick Fix Table

Issue	Resolution
Cluster shows no master / lost quorum	Run cluster recovery from a healthy redis-enterprise pod
REC stuck in degraded state	Verify infra health, then execute quorum recovery
Pods running but no leader elected	Use rladmin recovery workflow from a stable node
Cluster unhealthy after upgrade	Restore quorum, then allow Operator reconciliation
Databases not active	Validate database state after cluster recovery

Prerequisites

Access and permissions

kubectl or oc access
Ability to exec into redis-enterprise pods
Access to Operator logs and REC resource

Baseline health checks

Operator pod is running
CRDs are healthy
Nodes are Ready and schedulable
PVCs are Bound and attached

Understanding Quorum (Critical)

Quorum = majority of nodes (N/2 + 1)

Example:
- 3 nodes → quorum = 2
- 5 nodes → quorum = 3

If fewer than the majority are available or consistent, cluster management halts.

Step-by-Step: Recover Cluster Quorum

1. Verify cluster state

Run:

kubectl get pods -n <namespace>
kubectl get rec -n <namespace>

Confirm:

redis-enterprise pods exist
REC shows degraded / no master

2. Validate infrastructure health

Check:

kubectl get nodes
kubectl get pvc,pv

Ensure:

Nodes are Ready
PVCs are Bound
No scheduling or storage failures

3. Identify the correct recovery node

Choose a stable redis-enterprise pod with:

No crash loops
Healthy attached storage
Most recent state (not newly recreated)

Avoid:

Recently restarted pods
Pods stuck initializing
Nodes with missing volumes

4. Execute cluster recovery (core step)

Exec into the selected pod:

kubectl exec -it <redis-enterprise-pod> -n <namespace> -- bash

Run cluster recovery using rladmin

Use the official recovery command for your version (example pattern):

rladmin cluster recover

What this does:

Re-establishes quorum
Elects a new cluster master
Restores cluster management

If required:

Remove unreachable or stale nodes using rladmin
Allow them to rejoin later via Operator

5. Allow Operator reconciliation

Do NOT manually interfere here

Operator will:
- Reconcile StatefulSet
- Restart/rejoin nodes
- Restore full cluster state

Monitor:

kubectl get rec
kubectl get pods

6. Validate recovery

Cluster validation

Admin Console shows healthy cluster
All nodes present and connected

Database validation

Databases are Active
Modules loaded correctly
Reads/writes succeed

Critical Guardrails (Do NOT Skip)

Do NOT do the following:

Delete redis-enterprise pods blindly
Scale down StatefulSet
Recreate the REC resource
Restart all nodes simultaneously

These actions can:

Prolong downtime
Cause data inconsistency
Create split-brain scenarios

Troubleshooting

Cluster still cannot form quorum

Verify enough nodes exist to reach a majority
Ensure the selected recovery node is valid

Split-brain risk

Ensure only one recovery attempt is active
Confirm majority of nodes are not independently forming clusters

Pods running but cluster unhealthy

Check Operator logs
Inspect redis-enterprise pod logs

Storage issues

Confirm PVCs are Bound and correctly attached

Recovery command fails

Verify permissions inside pod
Confirm correct command for your version

OpenShift-Specific Considerations

Security

Validate SCC (Security Context Constraints)
Ensure Operator permissions remain intact

Node scheduling

If nodes were drained:
- Uncordon nodes
- Ensure pods can reschedule

Prevention Best Practices

Plan upgrades carefully

Follow supported upgrade paths
Avoid combining infra + Redis upgrades

Maintain quorum

Never take down majority of nodes
Stagger upgrades

Avoid concurrent changes

Do not reshard or scale during upgrades

Monitor actively

Replication health
Pod lifecycle
Cluster status

When to Contact Support

Engage Redis Support if:

Quorum cannot be restored
Recovery repeatedly fails
Cluster state is inconsistent

Provide:

Operator logs
redis-enterprise pod logs
REC YAML and events
Timeline of actions taken

Related to