Uploaded image for project: 'Flink'
  1. Flink
  2. FLINK-7282

Credit-based Network Flow Control

    XMLWordPrintableJSON

    Details

    • Type: New Feature
    • Status: Open
    • Priority: Major
    • Resolution: Unresolved
    • Affects Version/s: None
    • Fix Version/s: None
    • Component/s: Runtime / Network
    • Labels:
      None

      Description

      This is a part of work for network stack improvements proposed in [~StephanEwen] 's FLIP

      Backpressure currently happens very naturally through the TCP network connections and the bounded buffering capacity. The downsides are :

      • All channels multiplexed into the same TCP connection stall together, as soon as one channel has backpressure.
      • Under backpressure, connections can not transport checkpoint barriers.

      This flink-managed flow control is similar to the window-based advertisement mechanism in TCP. The basic approaches are the following:

      • Each RemoteInputChannel has fixed exclusive buffers as initial credits, and SingleInputGate has a fixed buffer pool for managing floating buffers for all RemoteInputChannels.
      • RemoteInputChannel as receiver notifies the current available credits to the sender side.
      • Senders must never send buffers without credit, that means all the buffers sent must be accepted by receivers so no buffers accumulated on the network wire.
      • Senders also send the current size of backlog that indicates how many buffers are available on the sender side. The receivers use this information to decide how to request floating buffers from the fixed buffer pool.

      To avoid immediate commits affecting master branch, it will be implemented into a separate feature branch.

        Attachments

        1.
        Create a fix size (non rebalancing) buffer pool type for the floating buffers Sub-task Closed Zhijiang  
        2.
        Manage exclusive buffers in RemoteInputChannel Sub-task Closed Zhijiang  
        3.
        Add credit field in PartitionRequest message Sub-task Closed Zhijiang  
        4.
        Define the BufferListener interface to replace EventListener in BufferProvider Sub-task Closed Zhijiang  
        5.
        Implement Netty receiver incoming pipeline for credit-based Sub-task Closed Zhijiang  
        6.
        Implement Netty receiver outgoing pipeline for credit-based Sub-task Closed Zhijiang  
        7.
        Implement sender backlog logic for credit-based Sub-task Closed Zhijiang  
        8.
        Implement Netty sender incoming pipeline for credit-based Sub-task Closed Zhijiang  
        9.
        Add the switch for keeping both the old mode and the new credit-based mode Sub-task Closed Zhijiang  
        10.
        Stop assigning floating buffers for blocked input channels in exactly-once mode Sub-task Closed Zhijiang
        11.
        Implement CheckpointBarrierHandler not to spill data for exactly-once based on credit-based flow control Sub-task Closed Zhijiang  
        12.
        The tag of waiting for floating buffers in RemoteInputChannel should be updated properly Sub-task Closed Zhijiang  
        13.
        Adapt BackPressureStatsTracker to work with credit-based flow control Sub-task Closed Piotr Nowojski  
        14.
        Make credit-based floating buffers optional Sub-task Closed Nico Kruber  
        15.
        Lower the minimum number of buffers for incoming channels to 1 Sub-task Closed boshu Zheng
        16.
        Make buffer count per InputGate always #channels*buffersPerChannel + ExclusiveBuffers Sub-task In Progress Nico Kruber
        17.
        Initial credit should be configured in a separate parameter Sub-task Closed Zhijiang  
        18.
        Avoid recursion stack overflow during releasing SingleInputGate Sub-task Closed Zhijiang  
        19.
        Introduce another greedy mechanism for distributing floating buffers Sub-task Closed Zhijiang  
        20.
        Fix the calculation of backlog in PipelinedSubpartition Sub-task Closed Zhijiang
        21.
        Construct special test/benchmark to verify the backlog effect Sub-task Closed Zhijiang  
        22.
        Remove non credit based network code Sub-task Closed Piotr Nowojski
        23.
        Remove the setting of netty channel watermark in NettyServer Sub-task Closed Zhijiang
        24.
        Solve the potential deadlock problem when reducing exclusive buffers to zero Sub-task Closed Unassigned  

          Activity

            People

            • Assignee:
              zjwang Zhijiang
              Reporter:
              zjwang Zhijiang
            • Votes:
              1 Vote for this issue
              Watchers:
              19 Start watching this issue

              Dates

              • Created:
                Updated:

                Time Tracking

                Estimated:
                Original Estimate - Not Specified
                Not Specified
                Remaining:
                Remaining Estimate - 0h
                0h
                Logged:
                Time Spent - 1h 40m
                1h 40m