Details
-
Improvement
-
Status: Resolved
-
Minor
-
Resolution: Not A Problem
-
0.7.0-incubating
-
None
-
None
Description
When translating a JobGraph to an ExecutionGraph, edges are assigned connection indexes to ensure deadlock freedom of the network stack. Deadlocks can easily occur when multiplexing multiple logical channels over a single TCP connection, because of the limited network buffers. Each incoming envelope, which has an attached buffer, needs to request a network buffer from the receiver's input buffer pool. If the receiver does not have a buffer available, it unsubscribes from the TCP connection until a new buffer is available. This unsubscribe then affects all other logical channels, which are multiplexed over the same TCP connection. With certain user programs this can result in a deadlock.
One of the problems when running in multiplexed mode is the question of when to close TCP connections in a reliable way without loosing or re-ordering network envelopes (as evident in FLINK-1063). In non-multiplexed mode this is trivial, because every logical channel is mapped to a phyiscal channel and the close of the logical channel translates to a close of the physical channel.
However, this approach results in limited scalability as high degrees of parallelism result in a high number of TCP connections per machine, which are continuously opened and closed. Still, it is useful as a baseline and should work fine with smaller setups.
Attachments
Issue Links
- is related to
-
FLINK-1063 Race condition in NettyConnectionManager
- Resolved
- is superceded by
-
FLINK-986 Add intermediate results to distributed runtime
- Resolved