Uploaded image for project: 'Flink'
  1. Flink
  2. FLINK-14551 Unaligned checkpoints
  3. FLINK-14472

Implement back-pressure monitor with non-blocking outputs

Attach filesAttach ScreenshotVotersWatch issueWatchersLinkCloneUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

    Details

      Description

      Currently back-pressure monitor relies on detecting task threads that are stuck in `requestBufferBuilderBlocking`. There are actually two cases to cause back-pressure ATM:

      • There are no available buffers in `LocalBufferPool` and all the given quotas from global pool are also exhausted. Then we need to wait for buffer recycling to `LocalBufferPool`.
      • No available buffers in `LocalBufferPool`, but the quota has not been used up. While requesting buffer from global pool, it is blocked because of no available buffers in global pool. Then we need to wait for buffer recycling to global pool.

      We try to implement the non-blocking network output in FLINK-14396, so the back pressure monitor should be adjusted accordingly after the non-blocking output is used in practice.

      In detail we try to avoid the current monitor way by analyzing the task thread stack, which has some drawbacks discussed before:

      • If the `requestBuffer` is not triggered by task thread, the current monitor is invalid in practice.
      • The current monitor is heavy-weight and fragile because it needs to understand more details of LocalBufferPool implementation.  

      We could provide a transparent method for the monitor caller to get the backpressure result directly, and hide the implementation details in the LocalBufferPool.

        Attachments

        Issue Links

          Activity

            People

            • Assignee:
              kevin.cyj Yingjie Cao
              Reporter:
              zjwang Zhijiang

              Dates

              • Created:
                Updated:
                Resolved:

                Time Tracking

                Estimated:
                Original Estimate - Not Specified
                Not Specified
                Remaining:
                Remaining Estimate - 0h
                0h
                Logged:
                Time Spent - 20m
                20m

                  Issue deployment