Flex v2 on Redis Software uses locally attached, dedicated SSDs (NVMe preferred) to extend beyond DRAM while keeping hot data in RAM. It enables terabyte-scale datasets with predictable performance and lower cost.
This article explains how to size, migrate, operate, and monitor Flex v2 on on-prem or cloud-VM deployments.
At a Glance / Versioning Summary
| Attribute | Description |
|---|---|
| Availability | RS 8.0+ supports Flex; RS 7.4 supports both Flex and Auto-Tiering; ≤ 7.2 Auto-Tiering only. |
| Engine | Requires Speedb storage engine and compatible licensing. |
| Best for | Large datasets with a smaller hot set and predictable locality using local NVMe Flash. |
| Change impact | Ratio changes are online but trigger a rebalance that moves data between tiers. |
| Active-Active (CRDB) | Not supported for Flex deployments. |
Prerequisites
Storage constraints (must enforce)
Locally attached, dedicated SSDs (NVMe Gen 4/5 preferred) — no NAS, SAN, or EBS for Flash.
Flash capacity ≥ total database size (RAM + Flash).
Persistence and Flash must be on separate disks.
For cloud VMs: Flash on instance NVMe; persistence on network storage (for example, EBS).
Version & engine: Flex requires Speedb and Redis 8.
Access & monitoring: Admin UI / CLI + Prometheus / Grafana or Datadog.
Backups & window: Recent backup and a defined maintenance window for configuration changes.
Quick Fix Table
| Problem | Fast check | Action |
|---|---|---|
| High p99 on cold reads | % Values in RAM; per-node Flash IOPS / queue depth | Increase RAM ratio; raise storage class; normalize access locality |
| Replica lag after spikes | NIC bandwidth; replica CPU/IO | Add replica RAM; redistribute reads; ensure bandwidth headroom |
| Evictions | Dataset vs memory; TTL churn | Scale memory; adjust eviction policy; audit TTL patterns |
| Throughput dip post-ratio change | Active resize / rebalance | Defer heavy jobs until completion; re-baseline |
When to Use Flex v2
Flex v2 is a good fit when:
Datasets are large but only a small, stable portion is hot.
Latency tolerance differs between hot and warm data.
Values average ~1 KB or smaller.
Typical usage
RAM tier: When p99 < 1 ms latency is critical.
Flex tier: When p99 < 10 ms is acceptable.
Deployment Planning
Plan Selection & Sizing
| Principle | Guidance |
|---|---|
| RAM ratio | Keep RAM ≥ 10% of total; target ≈ 20% of values in RAM. |
| Changes | Ratio edits are online but should be scheduled during a maintenance window. |
| IOPS headroom | Validate NVMe latency and sustained/peak IOPS for cold-read bursts. |
| Shard balance | Evenly distribute read/write pressure; avoid per-node hotspots. |
Upgrade Behavior: Flex v1 → Flex v2
When you upgrade your database to Redis 8.2 or later, Flex automatically transitions from the v1 architecture (values-only offload) to the v2 architecture (keys + values offload).
| Question | Answer |
|---|---|
| Do I need to migrate manually? | No. The transition occurs automatically during the Redis 8.2 upgrade. |
| Is downtime or data loss expected? | No. Handled within the standard upgrade process. |
| What changes after the upgrade? | The DB uses v2’s enhanced storage layer with better RAM efficiency and key offload. |
| What should I verify post-upgrade? | Normal metrics: latency, RAM hit ratio, Flash IOPS. |
Summary: Upgrading to Redis 8.2 automatically moves Flex v1 databases to Flex v2. No manual migration or extra steps are required. | |
Migration (non-version upgrades only)
| Method | Description | When to use |
|---|---|---|
| Snapshot cutover | Export RDB → import to Flex v2 DB → warm up → cut over. | Simple and safe for new hardware or layout changes. |
| Live migration | Use ReplicaOf or RDI / RIOT pipelines for live sync → validate → promote. | For minimal downtime between clusters. |
Validation Checklist
Use this checklist to confirm expected performance and stability after configuration or migration.
| Check | Target | How to verify |
|---|---|---|
| Endpoint health | Pass | Automated smoke tests |
| Latency (p99) | Within SLO | APM / Grafana |
| RAM hit ratio | Meets design | Exporter → Grafana |
| % Values in RAM | Meets design | Exporter → Grafana |
| Evictions | No spikes | Metrics + node logs |
| Per-node Flash IOPS | Headroom | Exporter / OS tools |
Operating & Monitoring
Admin UI path: Databases → (your database) → Configuration → Memory tiering
Ratio changes are online but move data between tiers. Always schedule a maintenance window and re-baseline after completion.
Monitor
RAM hit ratio, % values in RAM, latency (avg / p99), ops / sec
Evictions and expirations
Per-node Flash IOPS / queue depth
Replica lag
Use Prometheus / Grafana, Datadog, or Redis Insight for visualization.
Scaling
| Goal | Action |
|---|---|
| More capacity | Increase Flash allocation or lower RAM %. |
| More throughput | Add shards; scale vCPU to match. |
| Better latency | Increase RAM %, monitor p99 and hit ratio. |
Backups
Flex is not a system of record.
Maintain scheduled backups and verify restores regularly.
Troubleshooting Summary Table
| Symptom | Recommended Action |
|---|---|
| Cold-read latency | Increase RAM ratio; use higher-IOPS NVMe; pre-warm hot keys; optimize locality. |
| Evictions | Scale memory or adjust policy; correlate with TTL churn and dataset growth. |
| Resize side effects | Confirm active resize/rebalance; defer heavy jobs until completion. |
| Unexpected RAM growth | Check for keys/values > 4 GB — these remain in RAM and log warnings. |
Where to Look
Admin UI and CLI outputs, metrics exporter (Prometheus), and node logs under standard paths.
0 comments
Please sign in to leave a comment.