Uploaded image for project: 'Flink'
  1. Flink
  2. FLINK-23466

UnalignedCheckpointITCase hangs on Azure

Attach filesAttach ScreenshotVotersWatch issueWatchersCreate sub-taskLinkCloneUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

Details

    Description

      https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=20813&view=logs&j=a57e0635-3fad-5b08-57c7-a4142d7d6fa9&t=2ef0effc-1da1-50e5-c2bd-aab434b1c5b7&l=16016

      The problem is the buffer listener will be removed from the listener queue when notified and then it will be added to the listener queue again if it needs more buffers. However, if some buffers are recycled meanwhile, the buffer listener will not be notified of the available buffers. For example:

      1. Thread 1 calls LocalBufferPool#recycle().
      2. Thread 1 reaches LocalBufferPool#fireBufferAvailableNotification() and listener.notifyBufferAvailable() is invoked, but Thread 1 sleeps before acquiring the lock to registeredListeners.add(listener).
      3. Thread 2 is being woken up as a result of notifyBufferAvailable() call. It takes the buffer, but it needs more buffers.
      4. Other threads, return all buffers, including this one that has been recycled. None are taken. Are all in the LocalBufferPool.
      5. Thread 1 wakes up, and continues fireBufferAvailableNotification() invocation.
      6. Thread 1 re-adds listener that's waiting for more buffer registeredListeners.add(listener).
      7. Thread 1 exits loop LocalBufferPool#recycle(MemorySegment, int) inside, as the original memory segment has been used.

      At the end we have a state where all buffers are in the LocalBufferPool, so no new recycle() calls will happen, but there is still one listener waiting for a buffer (despite buffers being available).

      Attachments

        Issue Links

        Activity

          This comment will be Viewable by All Users Viewable by All Users
          Cancel

          People

            Unassigned Unassigned
            dwysakowicz Dawid Wysakowicz
            Votes:
            0 Vote for this issue
            Watchers:
            6 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Slack

                Issue deployment