Uploaded image for project: 'Flink'
  1. Flink
  2. FLINK-7949

AsyncWaitOperator is not restarting when queue is full

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Critical
    • Resolution: Fixed
    • 1.3.2
    • 1.4.1, 1.5.0
    • None
    • None

    Description

      Issue was describe here:
      http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Checkpoint-was-declined-tasks-not-ready-td16066.html
      Issue - AsyncWaitOperator can't restart properly after failure (thread is waiting forever)

      Scenario to reproduce this issue:
      1. The queue is full (let's assume that its capacity is N elements)
      2. There is some pending element waiting, so the
      pendingStreamElementQueueEntry field in AsyncWaitOperator is not null and
      while-loop in addAsyncBufferEntry method is trying to add this element to
      the queue (but element is not added because queue is full)
      3. Now the snapshot is taken - the whole queue of N elements is being
      written into the ListState in snapshotState method and also (what is more
      important) this pendingStreamElementQueueEntry is written to this list too.
      4. The process is being restarted, so it tries to recover all the elements
      and put them again into the queue, but the list of recovered elements hold
      N+1 element and our queue capacity is only N. Process is not started yet, so
      it can not process any element and this one element is waiting endlessly.
      But it's never added and the process will never process anything. Deadlock.
      5. Trigger is fired and indeed discarded because the process is not running
      yet.

      Attachments

        Issue Links

          Activity

            People

              bartektartanus Bartłomiej Tartanus
              bartektartanus Bartłomiej Tartanus
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Time Tracking

                  Estimated:
                  Original Estimate - 0.25h
                  0.25h
                  Remaining:
                  Remaining Estimate - 0.25h
                  0.25h
                  Logged:
                  Time Spent - Not Specified
                  Not Specified