Redis Enterprise for Kubernetes uses the Redis Enterprise Operator to manage cluster state through the RedisEnterpriseCluster (REC) resource. When nodes become unreachable or the REC is degraded, it typically indicates issues with pod health, scheduling, storage, or quorum rather than Redis data itself. This guide helps you quickly identify the root cause, restore cluster health, and safely recover using Step-by-Step Instructions and Troubleshooting.
Quick Fix
| What you see | What to do |
|---|---|
| REC status is degraded | Check Redis pod health and readiness |
Pods not Ready or stuck |
Inspect pod logs and Kubernetes events |
Pods in Pending
|
Resolve scheduling, resource, or node capacity issues |
| Cluster lost quorum | After infra and pods are healthy, run rladmin cluster recover from a healthy Redis pod |
| Node drain blocked (PDB error) | Ensure all pods are healthy before retrying |
Prerequisites
kubectlaccess to the clusterAccess to the Redis namespace
Permission to exec into Redis pods
Step-by-Step Instructions
1. Check REC and cluster status
kubectl get rec -n <namespace>
kubectl get pods -n <namespace> -o wideConfirm REC status (Healthy vs Degraded)
Identify pods that are not
Runningor notReady
2. Inspect failing or non-ready pods
kubectl describe pod <pod-name> -n <namespace>
kubectl logs <pod-name> -n <namespace> --all-containers --tail=200Look for:
Storage mount or PVC issues
Scheduling failures (insufficient CPU/memory, taints)
CrashLoopBackOff or startup failures
Readiness/liveness probe failures
3. Fix underlying infrastructure issues
Resolve any issues identified:
Increase cluster capacity if pods are Pending
Fix storage or volume attachment issues
Resolve node pressure or scheduling constraints
Correct misconfigurations in REC or environment
4. Recover cluster quorum (if needed)
If the cluster has lost quorum:
Run this only after fixing pod, node, network, and storage issues. Do not use it as the first step when REC is Degraded.
kubectl exec -it <redis-enterprise-pod> -n <namespace> -- rladmin cluster recoverRun this from a stable Redis pod
Allow time for cluster state to stabilize
5. Validate recovery
kubectl get rec -n <namespace>
kubectl get pods -n <namespace>REC should return to healthy state
All Redis pods should be
RunningandReady
Troubleshooting
| Issue | Resolution |
|---|---|
| Pods in CrashLoopBackOff | Configuration or runtime failure. Fix container config or dependencies |
| Pods stuck in Pending | Insufficient resources or scheduling constraints. Add capacity or adjust scheduling |
| REC remains degraded | Missing quorum or unhealthy nodes. Fix pod health, then recover cluster if required |
| Node drain fails with PDB error | Pods not fully ready. Resolve pod issues before draining |
| Changes not applying | Operator reconciliation blocked. Fix underlying pod or REC issue |
Key takeaways
REC degradation is usually caused by Kubernetes-level issues, not Redis data issues
Always fix pod health and infrastructure first
Use
rladmin cluster recoveronly when quorum is lostAllow the Operator to reconcile; avoid manual intervention on pods
0 comments
Please sign in to leave a comment.