Monitoring in Redis helps ensure performance, availability, and system health by exposing real-time metrics at every layer of the deployment. This guide covers Monitoring Methods including the Redis UI Console, Metrics Exporter, and REST API; options for Monitoring Tools Integration such as Prometheus, Grafana, Datadog, and others; a detailed list of Key Metrics to Monitor along with example thresholds; a Step-by-Step Monitoring Setup to enable exporters and dashboards; guidance for Troubleshooting Common Issues like memory pressure or latency spikes.
Monitoring Methods
Redis UI Console
- View real-time metrics at Cluster, Node, DB, and Shard levels.
- Metrics include: ops/sec, latency, CPU/Memory usage, disk space, connections, and network I/O.
- Tabs include Cluster, Nodes, DB, and Shards, each with specific stats.
- Limitations: No historical data, limited granularity. Best for quick checks.
Metrics Exporter (Port 8070)
- Redis exposes detailed metrics via HTTP on port 8070.
- Use the
/v2endpoint when running Redis Software 8.0.2-17+ (metrics stream engine GA). - For earlier versions (≤7.4), continue using
/metricsor/v1. - Categories include Cluster, Node, DB, Shard, Proxy, and Syncer.
- Example metrics: CPU, RAM, latency, ops/sec, eviction, hit ratio, service status.
REST API
- Returns real-time stats from Redis.
- Used by integrations like AppDynamics and Dynatrace.
- Supports orchestration and automation.
Monitoring Tools Integration
Prometheus & Grafana
- Prometheus scrapes metrics from exporter— Grafana visualizes data.
- Redis offers prebuilt dashboards: Cluster Status, DB Status.
- Prometheus & Grafana Integration
Datadog
- Integrate via Prometheus metrics collector.
- Dashboards include: Overview, Node, Shard, DB, Proxy, CRDB.
- Datadog Integration Guide
New Relic
- Agent-based monitoring with Prometheus integration.
- Prebuilt Redis dashboards available.
- New Relic Integration
AppDynamics
- Pulls stats from REST API.
- Visualizes cluster-level and DB-level metrics.
Dynatrace
- Uses ActiveGate secure proxy to scrape Redis metrics.
- Includes dashboard templates and setup scripts.
- Dynatrace Integration Guide
Other Tools
- Any tool that can scrape from port 8070 or call the REST API (9443) is compatible.
- Examples: OpenTelemetry, Telegraf.
Key Metrics to Monitor
Memory
- Monitor across Node, DB, and Shard.
- Alert if usage exceeds 80%.
CPU
- Monitor node and shard CPU usage.
- Redis is single-threaded, but services (proxy, syncer) require CPU.
Throughput (ops/sec)
- Indicates app request volume.
Latency
- Key performance indicator.
- Alert on spikes or trends.
Client Connections
- Watch for spikes or near limits.
Users Count
Tracks the total number of users configured in the cluster.
Metric:
rdse.users_countAlert:
cluster_users_count_approaches_limit(triggers when usage nears 90% of the 32,000-user ceiling)
Evictions/Expiry
- May signal memory exhaustion.
File Storage
- Persistent & ephemeral disk space for logs, backups, configs.
Network I/O
- Ingress/egress throughput per node.
Metric Threshold Examples
| Metric | Prometheus Name | Range |
|---|---|---|
| DB Latency | bdb_avg_latency | 1–10ms |
| Free RAM | node_free_memory | >65% |
| Node CPU Idle | node_cpu_idle | 60–80% |
| Persistent Storage Free | node_persistent_storage_free | >70% |
| Ops/sec | redis_total_commands_processed | 22k–25k |
| Shard Memory | redis_used_memory | 22–25 GB |
| Shard CPU | redis_process_cpu_usage_percent | 60–80% |
| Connections | redis_connected_clients | 6k–9k |
CRDB-Specific
| Metric | Prometheus Name | Expected |
|---|---|---|
| CRDB Lag | bdb_crdt_syncer_local_ingress_lag_time | 0–10ms |
| Syncer Status | bdb_crdt_syncer_status | 0 (in-sync) |
| Replica Lag | bdb_replicaof_syncer_local_ingress_lag_time | 0–10ms |
| Replica Status | bdb_replicaof_syncer_status | 0 (in-sync) |
Step-by-Step Monitoring Setup
1. Enable Metrics Exporter
- Ensure port 8070 is open.
- For Redis v7.8.2+, use
/v2endpoint:https://<ip>:8070/v2
2. Integrate Monitoring Tools
- Set up Prometheus, Datadog, or other tools using Redis documentation.
- Import Redis dashboards into Grafana or other platforms.
3. Set Up Alerts and Dashboards
- Configure alerts for memory, CPU, shard size, latency.
- Use time-series visualizations for trend analysis.
See Alerts and Events for details on supported alert types, delivery options, and event tracking.
Troubleshooting Common Issues
Memory Usage at Limit
- Symptom: Evictions, degraded throughput.
- Fix: Scale memory, adjust eviction policy, rebalance data.
High CPU
- Symptom: CPU > 80% consistently.
- Fix: Analyze slowlog, optimize queries, reshard or scale.
Latency or Throughput Drop
- Symptom: Slower command response.
- Fix: Check hotkeys, shard balance, network congestion.
Connection Failures
- Symptom: Client timeouts or dropped sessions.
- Fix: Check redis-cli connectivity, endpoint auth, TLS config.
Disk or Resource Pressure
- Symptom: Disk > 90%, RAM or CPU saturated.
- Fix: Clean logs, investigate unbounded growth.
Log Analysis
- Path:
/var/opt/redislabs/log/ - Files:
event_log.log,cluster_wd.log,dmcproxy.log,supervisord.log
0 comments
Please sign in to leave a comment.