Monitoring Redis Deployments – Redis Knowledge Base

Monitoring in Redis helps ensure performance, availability, and system health by exposing real-time metrics at every layer of the deployment. This guide covers Monitoring Methods including the Redis UI Console, Metrics Exporter, and REST API; options for Monitoring Tools Integration such as Prometheus, Grafana, Datadog, and others; a detailed list of Key Metrics to Monitor along with example thresholds; a Step-by-Step Monitoring Setup to enable exporters and dashboards; guidance for Troubleshooting Common Issues like memory pressure or latency spikes.

Monitoring Methods

Redis UI Console

View real-time metrics at Cluster, Node, DB, and Shard levels.
Metrics include: ops/sec, latency, CPU/Memory usage, disk space, connections, and network I/O.
Tabs include Cluster, Nodes, DB, and Shards, each with specific stats.
Limitations: No historical data, limited granularity. Best for quick checks.

Metrics Exporter (Port 8070)

Redis exposes detailed metrics via HTTP on port 8070.
Use the /v2 endpoint when running Redis Software 8.0.2-17+ (metrics stream engine GA).
For earlier versions (≤7.4), continue using /metrics or /v1.
Categories include Cluster, Node, DB, Shard, Proxy, and Syncer.
Example metrics: CPU, RAM, latency, ops/sec, eviction, hit ratio, service status.

REST API

Returns real-time stats from Redis.
Used by integrations like AppDynamics and Dynatrace.
Supports orchestration and automation.

Monitoring Tools Integration

Prometheus & Grafana

Prometheus scrapes metrics from exporter— Grafana visualizes data.
Redis offers prebuilt dashboards: Cluster Status, DB Status.
Prometheus & Grafana Integration

Datadog

Integrate via Prometheus metrics collector.
Dashboards include: Overview, Node, Shard, DB, Proxy, CRDB.
Datadog Integration Guide

New Relic

Agent-based monitoring with Prometheus integration.
Prebuilt Redis dashboards available.
New Relic Integration

AppDynamics

Pulls stats from REST API.
Visualizes cluster-level and DB-level metrics.

Dynatrace

Uses ActiveGate secure proxy to scrape Redis metrics.
Includes dashboard templates and setup scripts.
Dynatrace Integration Guide

Other Tools

Any tool that can scrape from port 8070 or call the REST API (9443) is compatible.
Examples: OpenTelemetry, Telegraf.

Key Metrics to Monitor

Memory

Monitor across Node, DB, and Shard.
Alert if usage exceeds 80%.

CPU

Monitor node and shard CPU usage.
Redis is single-threaded, but services (proxy, syncer) require CPU.

Throughput (ops/sec)

Indicates app request volume.

Latency

Key performance indicator.
Alert on spikes or trends.

Client Connections

Watch for spikes or near limits.

Users Count
Tracks the total number of users configured in the cluster.

Metric: rdse.users_count
Alert: cluster_users_count_approaches_limit (triggers when usage nears 90% of the 32,000-user ceiling)

Evictions/Expiry

May signal memory exhaustion.

File Storage

Persistent & ephemeral disk space for logs, backups, configs.

Network I/O

Ingress/egress throughput per node.

Metric Threshold Examples

Metric	Prometheus Name	Range
DB Latency	bdb_avg_latency	1–10ms
Free RAM	node_free_memory	>65%
Node CPU Idle	node_cpu_idle	60–80%
Persistent Storage Free	node_persistent_storage_free	>70%
Ops/sec	redis_total_commands_processed	22k–25k
Shard Memory	redis_used_memory	22–25 GB
Shard CPU	redis_process_cpu_usage_percent	60–80%
Connections	redis_connected_clients	6k–9k

CRDB-Specific

Metric	Prometheus Name	Expected
CRDB Lag	bdb_crdt_syncer_local_ingress_lag_time	0–10ms
Syncer Status	bdb_crdt_syncer_status	0 (in-sync)
Replica Lag	bdb_replicaof_syncer_local_ingress_lag_time	0–10ms
Replica Status	bdb_replicaof_syncer_status	0 (in-sync)

Step-by-Step Monitoring Setup

1. Enable Metrics Exporter

Ensure port 8070 is open.
For Redis v7.8.2+, use /v2 endpoint: https://<ip>:8070/v2

2. Integrate Monitoring Tools

Set up Prometheus, Datadog, or other tools using Redis documentation.
Import Redis dashboards into Grafana or other platforms.

3. Set Up Alerts and Dashboards

Configure alerts for memory, CPU, shard size, latency.
Use time-series visualizations for trend analysis.
See Alerts and Events for details on supported alert types, delivery options, and event tracking.

Troubleshooting Common Issues

Memory Usage at Limit

Symptom: Evictions, degraded throughput.
Fix: Scale memory, adjust eviction policy, rebalance data.

High CPU

Symptom: CPU > 80% consistently.
Fix: Analyze slowlog, optimize queries, reshard or scale.

Latency or Throughput Drop

Symptom: Slower command response.
Fix: Check hotkeys, shard balance, network congestion.

Connection Failures

Symptom: Client timeouts or dropped sessions.
Fix: Check redis-cli connectivity, endpoint auth, TLS config.

Disk or Resource Pressure

Symptom: Disk > 90%, RAM or CPU saturated.
Fix: Clean logs, investigate unbounded growth.

Log Analysis

Path: /var/opt/redislabs/log/
Files: event_log.log, cluster_wd.log, dmcproxy.log, supervisord.log

Related to