Redis processes commands on each shard using a single execution thread, so operations that scale with the size of a value can block other requests. This is most noticeable when deleting or modifying very large keys such as hashes with millions of fields, large sets or sorted sets, long lists, streams, or large string values. This article explains why large-key operations can cause latency spikes, how to identify large keys, and provides Step-by-Step Instructions, safe deletion and shrinking strategies, and troubleshooting guidance to safely perform these operations in production environments.
Quick Fix
| If you need to | Use this approach |
|---|---|
| Delete a single very large key safely | Use UNLINK key instead of DEL key |
| Stop application access immediately, then delete safely | RENAME key tmp:key:to_delete → UNLINK tmp:key:to_delete |
| Gradually reduce a large hash, set, or sorted set | Use HSCAN / SSCAN / ZSCAN with batched deletes |
| Reduce a large stream size | Use XTRIM in incremental steps |
| Retire a key without immediate deletion | Use EXPIRE key <seconds> |
| Clean up many keys across a database | Use SCAN + UNLINK (see bulk deletion article) |
Deleting many keys vs a single large key requires different approaches. (add links)
If you need to delete large numbers of keys by pattern, see Massive Key Deletion in Redis Without Impacting Performance.
For cluster-wide or Active-Active considerations, see Safe Key Deletion Strategies in Large Redis Clusters.
Prerequisites
Access to the Redis database endpoint
Ability to run Redis CLI commands or equivalent tooling
Visibility into database metrics (latency, CPU, slow log)
A maintenance window or low-traffic period for large operations
Confirmation that the key can be safely modified or removed
Why Large-Key Operations Can Cause Latency Spikes
Commands such as DEL, HDEL, ZREM, LREM, and similar operations can scale with the size of the value. When executed against very large keys, Redis must process all associated memory as part of command execution.
Because command execution on a shard is single-threaded, this can block other requests and lead to:
Increased latency for unrelated operations
High CPU utilization on a single shard
Slow log entries for large-key operations
Application timeouts or degraded performance
This behavior is expected and must be accounted for when operating on large keys.
Before You Delete or Shrink a Key
Identify large keys and estimate impact.
Run:
redis-cli --bigkeys
redis-cli --memkeys
MEMORY USAGE key
TYPE key
HLEN key
SCARD key
ZCARD key
LLEN key
XLEN keyThese help determine:
Key type
Approximate size or cardinality
Expected cost of deletion or modification
Check database health before proceeding.
Review:
Shard CPU utilization
Database latency metrics
Network throughput
Slow log (SLOWLOG GET)
If the shard is already under load, delay the operation or reduce batch sizes.
Safe Deletion Strategies
Use UNLINK Instead of DEL
UNLINK removes the key reference immediately and frees memory asynchronously, avoiding long blocking operations.
UNLINK large:keyUse this as the default approach for deleting very large keys in both Redis Cloud and Redis Software.
Rename Then Delete
If you need to immediately stop application access:
RENAME large:key tmp:large:key:to_delete
UNLINK tmp:large:key:to_deleteThis ensures:
Clients stop accessing the key immediately
Memory cleanup occurs asynchronously
Use Expiration for Delayed Cleanup
If immediate deletion is not required:
EXPIRE large:key 60This allows Redis to remove the key over time instead of performing a large synchronous delete.
Safe Shrinking Strategies
For large collections, reduce size incrementally instead of performing one large operation.
Hashes
HSCAN large:hash 0 COUNT 1000
HDEL large:hash field1 field2 field3Sets
SSCAN large:set 0 COUNT 1000
SREM large:set member1 member2 member3Sorted Sets
ZSCAN large:zset 0 COUNT 1000
ZREM large:zset member1 member2 member3Lists
LTRIM large:list 0 99999Streams
XTRIM large:stream MAXLEN ~ 100000Guidelines for batching:
Start with 500–5,000 elements per batch
Reduce batch size if latency increases
Pause briefly between batches when needed
Monitor CPU and latency continuously
Step-by-Step Instructions
Delete a Very Large Key Safely
Identify the key type and size.
Check shard CPU and latency metrics.
-
Run:
UNLINK key -
Monitor:
Latency
CPU
Slow log
Confirm application stability.
Shrink a Large Data Structure Safely
Identify the data type and size.
Choose a conservative batch size.
Iterate using SCAN-family commands.
Delete elements in batches.
Pause if latency increases.
Continue until the desired size is reached.
Monitor throughout.
Replace a Large Key Without Impact
Create a replacement key or update application logic.
Rename the original key.
Verify traffic no longer uses it.
Run UNLINK on the renamed key.
Confirm system stability.
Troubleshooting
| Issue | What it means | What to do |
|---|---|---|
| Latency spike during delete | Blocking operation on a large key | Use UNLINK instead of DEL |
| High CPU on one shard | Large-key operation or repeated O(N) commands | Check slow log and switch to batched approach |
| Cleanup causes latency spikes | Batch size too large | Reduce batch size and retry |
| Memory not freed immediately | Asynchronous memory reclaim or fragmentation | Continue monitoring before taking further action |
| Repeated large-key issues | Data model allows oversized keys | Redesign to split data across multiple keys |
Best Practices
Use UNLINK for large-key deletion.
This avoids blocking the shard during memory reclamation.
Shrink large data structures incrementally.
Batching reduces risk and provides control over impact.
Run cleanup during low-traffic periods.
Even safe operations can introduce load at scale.
Monitor slow log and metrics during operations.
Validate that changes are not introducing latency issues.
Avoid large keys through data modeling.
Split large datasets across multiple keys or partitions when possible.
Example: Latency spike caused by large-key operation
Despite healthy memory and CPU across nodes, a spike in proxy latency is observed. This pattern is typical of a blocking operation (e.g., DEL on a large key), where a single shard becomes temporarily unresponsive due to an O(N) operation.
0 comments
Please sign in to leave a comment.