Uploaded image for project: 'Flink'
  1. Flink
  2. FLINK-5660

Not properly cleaning PendingCheckpoints up

    Details

      Description

      When cleaning up a PendingCheckpoint, then it sometimes happens that not all state handles are properly discarded. The reason is that the discard operation is executed asynchronously but the synchronous discard call already cleans the task state collection.

        Issue Links

          Activity

          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user asfgit closed the pull request at:

          https://github.com/apache/flink/pull/3220

          Show
          githubbot ASF GitHub Bot added a comment - Github user asfgit closed the pull request at: https://github.com/apache/flink/pull/3220
          Hide
          till.rohrmann Till Rohrmann added a comment -

          1.2.0: f7e23548e9285fdf91366a259044139b4be0b095
          1.3.0: 009da6f6e1c532a12299ee5590bb46fcecb47c32

          Show
          till.rohrmann Till Rohrmann added a comment - 1.2.0: f7e23548e9285fdf91366a259044139b4be0b095 1.3.0: 009da6f6e1c532a12299ee5590bb46fcecb47c32
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user tillrohrmann closed the pull request at:

          https://github.com/apache/flink/pull/3221

          Show
          githubbot ASF GitHub Bot added a comment - Github user tillrohrmann closed the pull request at: https://github.com/apache/flink/pull/3221
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user tillrohrmann commented on the issue:

          https://github.com/apache/flink/pull/3221

          Thanks for the tests @rmetzger. Merging the PR.

          Show
          githubbot ASF GitHub Bot added a comment - Github user tillrohrmann commented on the issue: https://github.com/apache/flink/pull/3221 Thanks for the tests @rmetzger. Merging the PR.
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user tillrohrmann commented on the issue:

          https://github.com/apache/flink/pull/3220

          Thanks for the review @uce. Merging this PR.

          Show
          githubbot ASF GitHub Bot added a comment - Github user tillrohrmann commented on the issue: https://github.com/apache/flink/pull/3220 Thanks for the review @uce. Merging this PR.
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user uce commented on the issue:

          https://github.com/apache/flink/pull/3220

          +1 to merge this PR and #3221.

          Show
          githubbot ASF GitHub Bot added a comment - Github user uce commented on the issue: https://github.com/apache/flink/pull/3220 +1 to merge this PR and #3221.
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user rmetzger commented on the issue:

          https://github.com/apache/flink/pull/3221

          I've tested this fix again with my testing job on a cluster, and it works!

          Show
          githubbot ASF GitHub Bot added a comment - Github user rmetzger commented on the issue: https://github.com/apache/flink/pull/3221 I've tested this fix again with my testing job on a cluster, and it works!
          Hide
          till.rohrmann Till Rohrmann added a comment -

          Yes I also thought that this problem felt familiar. Maybe we forgot to port the fix to the master back in the days.

          Show
          till.rohrmann Till Rohrmann added a comment - Yes I also thought that this problem felt familiar. Maybe we forgot to port the fix to the master back in the days.
          Hide
          githubbot ASF GitHub Bot added a comment -

          GitHub user tillrohrmann opened a pull request:

          https://github.com/apache/flink/pull/3221

          [backport] FLINK-5660 [state] Fix state cleanup of PendingCheckpoint

          This is a backport of #3220 onto `release-1.2` branch.

          When calling PendingCheckpoint.dispose, the state contained of a pending checkpoint
          is discarded by an asynchronous task. Since this task accesses the taskStates field
          we must not clear it in PendingCheckpoint.dispose. Instead we will clear it once
          all state objects have been discarded from within the asynchronous task.

          You can merge this pull request into a Git repository by running:

          $ git pull https://github.com/tillrohrmann/flink pendingCheckpointFixCleanupBackport

          Alternatively you can review and apply these changes as the patch at:

          https://github.com/apache/flink/pull/3221.patch

          To close this pull request, make a commit to your master/trunk branch
          with (at least) the following in the commit message:

          This closes #3221



          Show
          githubbot ASF GitHub Bot added a comment - GitHub user tillrohrmann opened a pull request: https://github.com/apache/flink/pull/3221 [backport] FLINK-5660 [state] Fix state cleanup of PendingCheckpoint This is a backport of #3220 onto `release-1.2` branch. When calling PendingCheckpoint.dispose, the state contained of a pending checkpoint is discarded by an asynchronous task. Since this task accesses the taskStates field we must not clear it in PendingCheckpoint.dispose. Instead we will clear it once all state objects have been discarded from within the asynchronous task. You can merge this pull request into a Git repository by running: $ git pull https://github.com/tillrohrmann/flink pendingCheckpointFixCleanupBackport Alternatively you can review and apply these changes as the patch at: https://github.com/apache/flink/pull/3221.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #3221
          Hide
          githubbot ASF GitHub Bot added a comment -

          GitHub user tillrohrmann opened a pull request:

          https://github.com/apache/flink/pull/3220

          FLINK-5660 [state] Fix state cleanup of PendingCheckpoint

          When calling PendingCheckpoint.dispose, the state contained in a pending checkpoint
          is discarded by an asynchronous task. Since this task accesses the taskStates field
          we must not clear it in PendingCheckpoint.dispose. Instead we will clear it once
          all state objects have been discarded from within the asynchronous task.

          You can merge this pull request into a Git repository by running:

          $ git pull https://github.com/tillrohrmann/flink pendingCheckpointFixCleanup

          Alternatively you can review and apply these changes as the patch at:

          https://github.com/apache/flink/pull/3220.patch

          To close this pull request, make a commit to your master/trunk branch
          with (at least) the following in the commit message:

          This closes #3220



          Show
          githubbot ASF GitHub Bot added a comment - GitHub user tillrohrmann opened a pull request: https://github.com/apache/flink/pull/3220 FLINK-5660 [state] Fix state cleanup of PendingCheckpoint When calling PendingCheckpoint.dispose, the state contained in a pending checkpoint is discarded by an asynchronous task. Since this task accesses the taskStates field we must not clear it in PendingCheckpoint.dispose. Instead we will clear it once all state objects have been discarded from within the asynchronous task. You can merge this pull request into a Git repository by running: $ git pull https://github.com/tillrohrmann/flink pendingCheckpointFixCleanup Alternatively you can review and apply these changes as the patch at: https://github.com/apache/flink/pull/3220.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #3220
          Hide
          uce Ufuk Celebi added a comment -

          We did fix that for 1.1.4, right?

          Show
          uce Ufuk Celebi added a comment - We did fix that for 1.1.4, right?

            People

            • Assignee:
              till.rohrmann Till Rohrmann
              Reporter:
              till.rohrmann Till Rohrmann
            • Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development