Uploaded image for project: 'Flink'
  1. Flink
  2. FLINK-7282 Credit-based Network Flow Control
  3. FLINK-16403

Solve the potential deadlock problem when reducing exclusive buffers to zero

    XMLWordPrintableJSON

Details

    • Sub-task
    • Status: Closed
    • Critical
    • Resolution: Duplicate
    • None
    • None
    • Runtime / Network
    • None

    Description

      One motivation of this issue is for reducing the in-flight data in the case of back pressure to speed up checkpoint. The current default exclusive buffers per channel is 2. If we reduce it to 0 and increase somewhat floating buffers for compensation, it might cause deadlock problem because all the floating buffers might be requested away by some blocked input channels and never recycled until barrier alignment.

      In order to solve above deadlock concern, we can make some logic changes on both sender and receiver sides.

      • Sender side: it should revoke previous received credit after sending checkpoint barrier, that means it would not send any following buffers until receiving new credits.
      • Receiver side: after processing the barrier from one channel and setting it blocked, it should release the available floating buffers for this blocked channel, and restore requesting floating buffers until barrier alignment. That means the receiver would only announce new credits to sender side after barrier alignment.

      Another possible benefit to do so is that the floating buffers might be more properly made use of before barrier alignment. We can further verify the performance concern via existing micro-benchmark.

      Attachments

        Activity

          People

            Unassigned Unassigned
            zjwang Zhijiang
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: