Uploaded image for project: 'Flink'
  1. Flink
  2. FLINK-14872

(Partial fix) Potential deadlock for task reading from blocking ResultPartition.

    XMLWordPrintableJSON

Details

    Description

      Currently, the buffer pool size of InputGate reading from blocking ResultPartition is unbounded which have a potential of using too many buffers and may lead to ResultPartition of the same task can not acquire enough core buffers and finally lead to deadlock.

      Considers the following case:

      Core buffers are reserved for InputGate and ResultPartition -> InputGate consumes lots of Buffer (not including the buffer reserved for ResultPartition) -> Other tasks acquire exclusive buffer for InputGate and trigger redistribute of Buffers (Buffers taken by previous InputGate can not be released) -> The first task of which InputGate uses lots of buffers begin to emit records but can not acquire enough core Buffers (Some operators may not emit records out immediately or there is just nothing to emit) -> Deadlock.

       

      I think we can fix this problem by limit the number of Buffers can be allocated by a InputGate which reads from blocking ResultPartition.

      Attachments

        Issue Links

          Activity

            People

              kevin.cyj Yingjie Cao
              kevin.cyj Yingjie Cao
              Votes:
              0 Vote for this issue
              Watchers:
              8 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Time Tracking

                  Estimated:
                  Original Estimate - Not Specified
                  Not Specified
                  Remaining:
                  Remaining Estimate - 0h
                  0h
                  Logged:
                  Time Spent - 20m
                  20m