Identifying failures in a Redis cluster requires a combination of monitoring, inspection, and logging tools. This article outlines the most effective Diagnostic Tools available, categorized by function and use case. These tools help detect and diagnose issues related to cluster health, node failures, configuration errors, resource limits, and application behavior.
Prometheus & Grafana
Real-time and historical monitoring tools for Redis metrics
- Prometheus collects cluster metrics such as latency, memory usage, throughput, and errors
- Grafana visualizes these metrics and supports alerting to quickly surface anomalies
- Ideal for long-term observability and proactive failure detection
Redis Insight (with Copilot)
Visual interface for performance and anomaly detection
- Tracks command patterns, memory usage, and throughput
- Includes Copilot, which provides automated troubleshooting suggestions
- Useful for identifying performance bottlenecks and misbehaving clients
Redis Admin UI
Web-based UI for quick status checks
Shows the current status of nodes, databases, and cluster components
Elements in warning or error states are highlighted with yellow or red indicators
Alerts may appear on both the node and database views
Logged events can be reviewed in the Logs section of the UI
Command-Line Tools
rladmin
-
Primary CLI for Redis
-
rladmin status extra allshows cluster-wide status including nodes, shards, and endpoints
This command displays the status of all cluster elements, including node and shard roles, current versions, and whether each component is in an OK state or showing as Missing or in error.You can also run the following command only to see the errors:
rladmin status extra all errors_onlyReview the rladmin documentation for all commands available.
You can also type “?” or “help” to see the rest of commands and use tab for completion works in rladmin CLI.:
rlcheck
- Runs a suite of health and configuration checks on the cluster
supervisorctl
- Monitors Redis internal service processes
- Useful for checking if any core management components have failed
redis-cli
- Direct database interaction tool
- Common commands:
INFO,PING,SLOWLOG,MONITOR, and key inspection (--bigkeys,--memkeys)
Operating System Utilities
Standard system tools for node-level diagnostics
-
df: Check disk usage -
free: Check memory availability -
toporhtop: Monitor CPU usage and load average -
dig: Verify DNS resolution and network connectivity
Log Files
Review logs for detailed error messages and service events
- Key log files:
event_log.log-
cluster_wd.log(watchdog) supervisord.logdmcproxy.logresource_mgr.log- Shard logs (e.g.,
redis-<id>.log)
- Default log directory:
/var/opt/redislabs/log - Critical for identifying failure chains, process crashes, or sync issues
Support Packages
Comprehensive diagnostic bundles for deep-dive troubleshooting
- Can be generated via Redis GUI or CLI
- Includes configuration files, logs, system stats, and health reports
- Useful for Redis Support or internal post-incident analysis
Best Practice
Use these tools in combination to investigate from multiple angles:
- Monitor with Prometheus/Grafana
- Investigate with RedisInsight, rladmin, and log files
- Validate infrastructure health with OS commands
- Collect a Support Package for complex or unclear failures
This multi-layered approach ensures a thorough and systematic troubleshooting process across Redis deployments.
0 comments
Please sign in to leave a comment.