Redis Software cluster nodes must communicate with each other over required TCP ports for cluster management, node health, shard placement, endpoint availability, discovery services, and REST API operations. When traffic is blocked between nodes, Redis Software may report verify_tcp_connectivity failures, CCS ERROR: TIMEOUT, missing endpoints, or shard timeout errors. This article covers Symptoms, Quick Fix, Affected ports, Step-by-step troubleshooting, Expected result after resolution, and When to contact Redis Support.
For Active-Active replication issues between CRDB participants, use Troubleshoot Unreachable Active-Active (CRDB) Participants. For endpoint movement, failover behavior, or client reconnection after topology changes, use Understanding Cluster Convergence and Endpoint Rebinding. Redis Software port requirements are documented in the Redis Software network port configuration documentation. Redis notes that Redis Software deployments span multiple physical or virtual nodes and require several ports to remain open between them. (Redis)
Prerequisites
Access to the Redis Software nodes.
You need shell access to the source and destination nodes shown in the connectivity error.
Permission to review network controls.
You may need access to host firewall rules, cloud security groups, network security groups, subnet ACLs, routing tables, or external firewall policies.
Redis Software admin permissions.
You need permission to run Redis Software health and status commands, including rladmin status extra all and rlcheck.
The Redis Software port list for your deployment.
Before changing firewall rules, compare the failed ports with the Redis Software port requirements. Redis Software port usage is grouped into internal, external, and Active-Active traffic, so the correct remediation depends on which port and traffic type is failing. (Redis)
Symptoms
You may be hitting this issue if you see one or more of the following symptoms.
TCP connectivity checks fail between Redis Software nodes.
You may see an error similar to the following from the rlcheck utility.
verify_tcp_connectivity: connectivity check failed from <source_node_ip> to <destination_node_ip>
for the following ports: <port_list>
Stopping on Error: connectivity check failed from <source_node_ip> to <destination_node_ip>
for the following ports: <port_list>
A node appears in CCS ERROR: TIMEOUT state.rladmin status nodes indicates that Redis Software cluster services cannot communicate reliably with that node.
Endpoints or shards on the affected node show missing or timeout errors.
For example, rladmin status extra all may show endpoints as missing or shards as ERROR: timed out.
Cluster diagnostics may be incomplete.
If Redis Software cannot reliably communicate with the affected node, cluster-wide support package collection may not include complete logs from that node.
Quick Fix
| If you see | Do this |
|---|---|
| TCP connectivity checks failing between two nodes | Verify network reachability between the source and destination nodes. Confirm the required Redis Software TCP ports are open in both directions. |
A node in CCS ERROR: TIMEOUT
|
Treat the issue as an inter-node communication problem first. Do not start with database-level troubleshooting unless node connectivity is already confirmed healthy. |
Shards or endpoints on one node showing missing or ERROR: timed out
|
Confirm that the affected node is reachable from every other cluster node and that no host firewall, security group, network ACL, or routing rule is blocking Redis Software traffic. |
| Cluster support package is missing logs from the affected node | Collect a node-level debuginfo package directly from the impacted node. |
Affected ports
The ports shown in the error are the ports Redis Software could not reach from the source node to the destination node.
In one common failure pattern, the failed ports may include:
3333, 3340, 3344, 8001, 8080, 9443These ports map to different Redis Software traffic types:
| Port or range | Redis Software use |
|---|---|
3333-3345 |
Internode communication |
8001 |
Discovery service traffic |
8080 |
REST API traffic over HTTP by default |
9443 |
REST API traffic over HTTPS by default |
Redis Software documents 3333-3345, 3350-3354, and 36379 as internal internode communication ports, 8001 as discovery service traffic, and 9443, 8080, and 3346 as REST API traffic used for cluster management and node bootstrap. (Redis)
The exact required ports can vary based on Redis Software version, custom port configuration, cluster topology, and whether Active-Active is in use. Redis Software uses 9443 for secure REST API traffic and 8080 for non-secure REST API traffic by default, but these ports can be changed with rladmin cluster config cnm_http_port and rladmin cluster config cnm_https_port. (Redis)
Step-by-step troubleshooting
1. Confirm the failure pattern
Run:
rladmin status extra allLook for the following indicators.
A node in CCS ERROR: TIMEOUT.
This usually points to cluster communication failure with that node.
Endpoints marked missing.
This can happen when Redis Software cannot confirm endpoint state on the affected node.
Shards marked ERROR: timed out.
This can appear when shard status cannot be collected or validated because node communication is failing.
2. Identify the source and destination nodes
Run on each node:
rlcheckLook for the following indicators.
verify_tcp_connectivity: connectivity check failed...
Use the verify_tcp_connectivity error to identify the direction of the failure.
connectivity check failed from <source_node_ip> to <destination_node_ip>
for the following ports: <port_list>The source node is where the connectivity check started.
Run connectivity tests from this node first.
The destination node is the node that Redis Software could not reach.
If failures consistently point to the same destination node, focus first on that node’s firewall, security group, subnet ACL, routing path, and host-based security tooling.
3. Validate inter-node TCP connectivity
From the source node, test each failed port against the destination node.
nc -vz <destination_node_ip> 3333
nc -vz <destination_node_ip> 3340
nc -vz <destination_node_ip> 3344
nc -vz <destination_node_ip> 8001
nc -vz <destination_node_ip> 8080
nc -vz <destination_node_ip> 9443A successful result usually looks like:
Connection to <destination_node_ip> <port> port [tcp/*] succeeded!A failed result may look like:
nc: connect to <destination_node_ip> port <port> (tcp) failed: Connection timed outIf the tests fail, investigate the following network layers.
OS firewall rules.
Check firewalld, iptables, nftables, or local host security tooling.
Cloud firewall rules.
Check AWS security groups, Azure network security groups, Google Cloud firewall rules, or equivalent controls.
Network ACLs.
Confirm subnet-level ACLs allow Redis Software node-to-node traffic.
Routing between subnets or VLANs.
Confirm the source and destination nodes have a valid route to each other.
Host-based packet filtering or endpoint security software.
Some security agents can block ports even when cloud-level rules are correct.
4. Validate connectivity in both directions
Redis Software nodes need reliable node-to-node communication. Do not validate only one direction.
From the source node to the destination node:
nc -vz <destination_node_ip> <port>From the destination node back to the source node:
nc -vz <source_node_ip> <port>Repeat this for each port listed in the error.
If the cluster has more than two nodes, validate the required Redis Software ports across all cluster node pairs, not only the pair shown in the first error.
5. Review Redis Software logs on the impacted node
When available, review logs on the impacted node for connection failures, refused connections, timeout patterns, or watchdog activity.
Common logs to review include:
/var/opt/redislabs/log/event_log.log
/var/opt/redislabs/log/cluster_wd.log
/var/opt/redislabs/log/dmcproxy.logUse the logs to confirm whether Redis Software services are unavailable, blocked by network policy, or unable to report status back to the cluster.
6. Collect diagnostics correctly
If cluster-wide support package collection works, collect a full support package for the cluster.
If the cluster support package is incomplete or does not include usable logs from the impacted node, collect node-level diagnostics directly from that node:
/opt/redislabs/bin/debuginfo
Include both the cluster support package and the node-level package when contacting Redis Support.
7. Re-check cluster health after the network fix
After firewall, routing, or security policy changes are applied, re-run:
rladmin status
rladmin status extra all
rlcheckConfirm the following results.
The affected node no longer shows CCS ERROR: TIMEOUT.
The cluster should be able to communicate with the node.
Endpoints are no longer marked missing.
Endpoint status should be visible again.
Shards recover from timeout states.
Shard status should be collected successfully.
Connectivity checks complete successfully.verify_tcp_connectivity should no longer fail for the affected node pair.
Expected result after resolution
After the network path is restored, the affected node should return to a healthy communication state and should no longer show CCS ERROR: TIMEOUT. Endpoints hosted on the affected node should no longer appear as missing, shard status should stabilize, and rlcheck or related validation checks should complete without TCP connectivity failures.
When to contact Redis Support
Open or update a Redis Support ticket if any of the following apply.
Port tests still fail after firewall and routing changes.
Include the exact source node, destination node, port list, and test output.
One node remains unreachable through rladmin.
Include rladmin status, rladmin status extra all, and any available node-level logs from the affected node.
Node-specific logs cannot be collected.
Mention whether cluster support package collection skipped or failed on the impacted node.
Shards remain unavailable after network connectivity is restored.
This may require Redis Support to review shard state, cluster metadata, and node logs.
The environment uses custom Redis Software ports.
Include the custom port configuration and any recent network or deployment changes.
0 comments
Please sign in to leave a comment.