Uploaded image for project: 'Flink'
  1. Flink
  2. FLINK-29298

LocalBufferPool request buffer from NetworkBufferPool hanging

    XMLWordPrintableJSON

Details

    Description

      In the scenario where the buffer contention is fierce, sometimes the task hang can be observed. Through the thread dump information, we can found that the task thread is blocked by requestMemorySegmentBlocking forever. After investigating the dumped heap information, I found that the NetworkBufferPool actually has many buffers, but the LocalBufferPool is still unavailable and no buffer has been obtained.

      By looking at the code, I am sure that this is a bug in thread race: when the task thread polled out the last buffer in LocalBufferPool and triggered the onGlobalPoolAvailable callback itself, it will skip this notification  (as currently the LocalBufferPool is available), which will cause the BufferPool to eventually become unavailable and will never register a callback to the NetworkBufferPool.

      The conditions for triggering the problem are relatively strict, but I have found a stable way to reproduce it, I will try to fix and verify this problem.

      Attachments

        1. image-2022-09-14-10-52-15-259.png
          2.22 MB
          Weijie Guo
        2. image-2022-09-14-10-58-45-987.png
          1.22 MB
          Weijie Guo
        3. image-2022-09-14-11-00-47-309.png
          146 kB
          Weijie Guo

        Issue Links

          Activity

            People

              Weijie Guo Weijie Guo
              Weijie Guo Weijie Guo
              Votes:
              0 Vote for this issue
              Watchers:
              9 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: