Recover Redis Enterprise for Kubernetes when REC is degraded or nodes are unreachable – Redis Knowledge Base

Redis Enterprise for Kubernetes uses the Redis Enterprise Operator to manage cluster state through the RedisEnterpriseCluster (REC) resource. When nodes become unreachable or the REC is degraded, it typically indicates issues with pod health, scheduling, storage, or quorum rather than Redis data itself. This guide helps you quickly identify the root cause, restore cluster health, and safely recover using Step-by-Step Instructions and Troubleshooting.

Quick Fix

What you see	What to do
REC status is degraded	Check Redis pod health and readiness
Pods not `Ready` or stuck	Inspect pod logs and Kubernetes events
Pods in `Pending`	Resolve scheduling, resource, or node capacity issues
Cluster lost quorum	After infra and pods are healthy, run `rladmin cluster recover` from a healthy Redis pod
Node drain blocked (PDB error)	Ensure all pods are healthy before retrying

Prerequisites

kubectl access to the cluster
Access to the Redis namespace
Permission to exec into Redis pods

Step-by-Step Instructions

1. Check REC and cluster status

kubectl get rec -n <namespace>
kubectl get pods -n <namespace> -o wide

Confirm REC status (Healthy vs Degraded)
Identify pods that are not Running or not Ready

2. Inspect failing or non-ready pods

kubectl describe pod <pod-name> -n <namespace>
kubectl logs <pod-name> -n <namespace> --all-containers --tail=200

Look for:

Storage mount or PVC issues
Scheduling failures (insufficient CPU/memory, taints)
CrashLoopBackOff or startup failures
Readiness/liveness probe failures

3. Fix underlying infrastructure issues

Resolve any issues identified:

Increase cluster capacity if pods are Pending
Fix storage or volume attachment issues
Resolve node pressure or scheduling constraints
Correct misconfigurations in REC or environment

4. Recover cluster quorum (if needed)

If the cluster has lost quorum:

Run this only after fixing pod, node, network, and storage issues. Do not use it as the first step when REC is Degraded.

kubectl exec -it <redis-enterprise-pod> -n <namespace> -- rladmin cluster recover

Run this from a stable Redis pod
Allow time for cluster state to stabilize

5. Validate recovery

kubectl get rec -n <namespace>
kubectl get pods -n <namespace>

REC should return to healthy state
All Redis pods should be Running and Ready

Troubleshooting

Issue	Resolution
Pods in CrashLoopBackOff	Configuration or runtime failure. Fix container config or dependencies
Pods stuck in Pending	Insufficient resources or scheduling constraints. Add capacity or adjust scheduling
REC remains degraded	Missing quorum or unhealthy nodes. Fix pod health, then recover cluster if required
Node drain fails with PDB error	Pods not fully ready. Resolve pod issues before draining
Changes not applying	Operator reconciliation blocked. Fix underlying pod or REC issue

Key takeaways

REC degradation is usually caused by Kubernetes-level issues, not Redis data issues
Always fix pod health and infrastructure first
Use rladmin cluster recover only when quorum is lost
Allow the Operator to reconcile; avoid manual intervention on pods

Related to