Redis Software deployments running on ACRE (Azure Cache for Redis Enterprise) and AMR (Azure Managed Redis) differ in how VM-level connection limits are enforced. This directly affects proxy stability, health-check behavior, and overall SLO performance under high connection volumes. This article explains the platform differences and provides actionable guidance for Quick Mitigation, Connection Control on ACRE, Operating on AMR, and Troubleshooting High Connection Counts.
Quick Fix
| Platform | Behavior and Immediate Action |
|---|---|
| ACRE |
No platform-enforced per-VM connection limit. High connection storms (>10K–16K per VM) can overload dmcproxy, causing CPU spikes and restart loops. •Apply rate limiting or per-source connection limits at the load balancer •Tighten idle and keep-alive timeouts •Reduce health-check concurrency •Redeploy affected VM only after throttling traffic |
| AMR |
Platform-enforced per-VM connection caps protect proxy stability. Excess connections are rejected or throttled. •Confirm current cap with support •Scale horizontally if traffic exceeds per-VM limits •Ensure clients implement retry with backoff |
Prerequisites
Access to VM-level metrics (Connected Clients, proxy CPU, restart events)
Access to load balancer or reverse proxy configuration
Confirmation of whether the deployment is ACRE or AMR
Ability to open a Redis support ticket if needed
Platform Behavior Explained
ACRE
On ACRE, there is no hard per-VM connection limit. The VM will keep accepting connections until it runs out of resources.
ACRE doesn't enforce a hard connection cap per VM.
The effective limit is determined by:
VM size (CPU and memory)
OS file descriptor limits
Proxy thread capacity
Network stack configuration
Load balancer settings
Client connection behavior
This means:
16K concurrent connections per VM is possible
Small SKUs (for example, 2-core VMs) can saturate proxy CPU
Connection storms can trigger watchdog restarts of dmcproxy
SLO may degrade even if shards are healthy
On ACRE, connection scaling is your responsibility.
As a simple rule of thumb, start paying attention when small 2-core VMs reach around 10K connections per VM, and treat more than 15K as a high-risk level for proxy instability.
AMR
AMR enforces per-VM connection limits to protect stability.
When limits are exceeded:
New connections may be rejected
Traffic may be throttled
Backpressure occurs before proxy saturation
If you require guaranteed connection ceilings, AMR provides that control.
To confirm or request adjustments, open a support ticket including:
Service name
Region
Current peak connections
Desired cap
Traffic profile and justification
Controlling Connection Counts on ACRE
Use a layered control strategy.
1. Application Layer
First line of defense.
Enable connection pooling
Use HTTP keep-alive
Configure max connection limits in application servers
Tune idle and read/write timeouts
Avoid burst-style connection creation
Poor pooling behavior is one of the most common causes of connection storms.
2. Load Balancer / Reverse Proxy
Primary enforcement layer.
Set maximum concurrent connections per backend VM
Apply per-source connection limits (connlimit or equivalent)
Enable rate limiting
Use request queuing
Tune idle timeouts
Do not rely solely on backend capacity to absorb spikes.
3. OS and Kernel (Linux Examples)
Ensure system limits align with expected load.
Increase ulimit -n and fs.file-max as appropriate
Tune net.core.somaxconn
Monitor backlog queues
Watch TIME_WAIT accumulation
Avoid ephemeral port exhaustion by reusing connections
4. Health Check Optimization
Health checks can unintentionally amplify connection volume.
Optimize by:
Increasing probe intervals
Reducing concurrency
Setting appropriate timeouts
Using lightweight endpoints (simple 200 OK)
Reusing connections when possible
Aggressive health-check patterns frequently worsen connection storms.
Troubleshooting High Connection Counts
Step 1: Identify Sources
On the VM:
ss -antpReview:
Remote IP distribution
Connection states (ESTAB, SYN_RECV, TIME_WAIT, CLOSE_WAIT)
In logs:
Load balancer request logs
Proxy logs
Restart events
Step 2: Classify the Pattern
Determine whether connections are:
Long-lived and stable
Rapid burst creation
Failed connection attempts
Stuck in abnormal states
Connection storms often show:
Rapid growth across multiple nodes
High failed connection attempts
Proxy CPU spikes
Watchdog restart loops
Step 3: Stabilize the Environment
Short-term mitigation:
Apply aggressive edge throttling
Reduce idle timeouts
Block or rate-limit misbehaving client IPs
Scale horizontally if possible
Redeploy affected VM only after traffic reduction
Redeploying without controlling traffic will not prevent recurrence.
Step 4: Prevent Recurrence
Implement connection pooling standards
Alert on connected client count per VM
Alert on proxy CPU
Alert on restart events
Right-size VM SKU
Consider AMR if enforced caps are required
Frequently Asked Questions
-
Is there a hard per-VM connection limit on ACRE?
No. ACRE does not enforce a platform-level limit.
-
Why did I observe more than 16K connections on a VM?
Because ACRE does not cap connections. Limits depend on OS, proxy capacity, and client behavior.
-
Can connection storms cause SLO degradation?
Yes. High connection counts can overload dmcproxy, causing restart loops and SLO impact.
-
Are connections always routed only to primary shard nodes?
Expected behavior is routing to primary shard nodes. If traffic appears distributed unexpectedly, investigate load balancer probing and routing logic.
-
I need guaranteed connection caps. What should I use?
AMR enforces per-VM connection limits.
When to Open a Support Ticket
Open a ticket if:
Connections exceed 10K–15K per VM
Proxy enters restart loop
SLO degrades without shard failure
Traffic routing appears unexpected
You need AMR cap confirmation or adjustment
Include:
Service name
Region
VM SKU
Peak connection count
Timeline
Recent configuration or traffic changes
Key Takeaways
ACRE does not enforce connection caps.
AMR enforces per-VM connection limits.
Connection storms overload proxies before shards fail.
Health checks and client behavior are common root causes.
Layered throttling and pooling controls are essential on ACRE.
AMR is recommended for environments requiring hard ceilings.
0 comments
Please sign in to leave a comment.