[FLINK-25417] Too many connections for TM - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Improvement
Status: Closed
Priority: Major
Resolution: Duplicate
Affects Version/s: 1.13.5, 1.14.2, 1.15.0
Fix Version/s: None
Component/s: Runtime / Network
Labels:
None

Description

Hi masters, when the number of task exceeds 10, some TM has more than 4000 TCP connections.

Reason:

When the task is initialized, the downstream InputChannel will connect to the upstream ResultPartition.

In PartitionRequestClientFactory#createPartitionRequestClient, there is a clients(ConcurrentMap<ConnectionID, CompletableFuture<NettyPartitionRequestClient>> clients). It's a cache to avoid repeated tcp connections. But the ConnectionID has a field is connectionIndex.

The connectionIndex comes from IntermediateResult, which is a random number. When multiple Tasks are running in a TM, other TMs need to establish multiple connections to this TM, and each Task has a NettyPartitionRequestClient.

Assume that the parallelism of the flink job is 100, each TM has 20 Tasks, and the Partition strategy between tasks is rebalance or hash. Then the number of connections for a single TM is (20-1) * 100 * 2 = 3800. If multiple such TMs are running on a single node, there is a risk.

I want to know whether it is risky to change the cache key to connectionID.address? That is: a tcp connection is shared between all Tasks of TM.

I guess it is feasible because:

I have tested it and the task can run normally.

The Message contains the InputChannelID, which is used to distinguish which channel the NettyMessage belongs to.

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending

image-2021-12-22-19-17-59-486.png
22/Dec/21 11:18
327 kB
Rui Fan
image-2021-12-22-19-18-23-138.png
22/Dec/21 11:18
277 kB
Rui Fan

Issue Links

duplicates

FLINK-22643 Too many TCP connections among TaskManagers for large scale jobs

Closed

is duplicated by

FLINK-22643 Too many TCP connections among TaskManagers for large scale jobs

Closed

Activity

People

Assignee:: Unassigned

Reporter:: Rui Fan

Votes:: 0 Vote for this issue

Watchers:: 3 Start watching this issue

Dates

Created:: 22/Dec/21 11:19

Updated:: 29/Dec/21 10:58

Resolved:: 29/Dec/21 10:58