Uploaded image for project: 'Flink'
  1. Flink
  2. FLINK-5928

Externalized checkpoints overwritting each other

    Details

      Description

      I noticed that PR #3346 accidentally broke externalized checkpoints by using a fixed meta data file name. We should restore the old behaviour with creating random files and double check why no test caught this.

      This will likely superseded by upcoming changes from Stephan Ewen to use metadata streams on the JobManager.

        Issue Links

          Activity

          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user asfgit closed the pull request at:

          https://github.com/apache/flink/pull/3424

          Show
          githubbot ASF GitHub Bot added a comment - Github user asfgit closed the pull request at: https://github.com/apache/flink/pull/3424
          Hide
          StephanEwen Stephan Ewen added a comment -

          Fixed via c477d87c68f2da4340c8d469e1b4331e6a660ef0

          Show
          StephanEwen Stephan Ewen added a comment - Fixed via c477d87c68f2da4340c8d469e1b4331e6a660ef0
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user StephanEwen commented on the issue:

          https://github.com/apache/flink/pull/3424

          Looks good, merging this...

          Show
          githubbot ASF GitHub Bot added a comment - Github user StephanEwen commented on the issue: https://github.com/apache/flink/pull/3424 Looks good, merging this...
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user StephanEwen commented on the issue:

          https://github.com/apache/flink/pull/3424

          Thanks, good and critical fix!
          Looking at this...

          Show
          githubbot ASF GitHub Bot added a comment - Github user StephanEwen commented on the issue: https://github.com/apache/flink/pull/3424 Thanks, good and critical fix! Looking at this...
          Hide
          githubbot ASF GitHub Bot added a comment -

          GitHub user uce opened a pull request:

          https://github.com/apache/flink/pull/3424

          FLINK-5928 [checkpoints] Use custom metadata file for externalized checkpoints

          • Adds a checkpoint coordinator test for externalized checkpoints. This was covered only in unit tests for the involved checkpoint components like PendingCheckpoint etc. This would have caught the issue.
          • The fix is to not use a `_metadata` but a random `checkpoint_metadata-:randomSuffix` file for externalized checkpoints, because they are not unique per configured directory. Hopefully, we can get rid of this soon with @StephanEwen's refactorings.

          This is based on top of #3411, which is already good to merge.

          You can merge this pull request into a Git repository by running:

          $ git pull https://github.com/uce/flink 5928-ext_chk_metadata

          Alternatively you can review and apply these changes as the patch at:

          https://github.com/apache/flink/pull/3424.patch

          To close this pull request, make a commit to your master/trunk branch
          with (at least) the following in the commit message:

          This closes #3424


          commit a3d2405f690e08d3de4f641428887ab04ba2ca2d
          Author: Stephan Ewen <sewen@apache.org>
          Date: 2017-02-17T16:51:00Z

          FLINK-5822 [state backends] Make JobManager / Checkpoint Coordinator aware of the root state backend

          commit b2f3bc41bc991e8deb22fb89822f28c75d94c8f7
          Author: Stephan Ewen <sewen@apache.org>
          Date: 2017-02-22T21:18:50Z

          FLINK-5897 [checkpoints] Make checkpoint externalization not depend strictly on FileSystems

          That is the first step towards checkpoints that can be externalized to other stores as well,
          like k/v stores and databases, if supported by the state backend.

          commit 537be203dab0614383645e859bbafb6ebfeb3161
          Author: Ufuk Celebi <uce@apache.org>
          Date: 2017-02-27T15:12:37Z

          FLINK-5928 [checkpoints] Add CheckpointCoordinatorExternalizedCheckpointsTest

          Problem: there were only unit tests for the checkpoint instances available
          that don't test the behaviour of the checkpoint coordinator with respect
          to externalized checkpoints.

          commit 88e4700cce630f8ae869abff22acfd46ab999aa0
          Author: Ufuk Celebi <uce@apache.org>
          Date: 2017-02-27T15:58:14Z

          FLINK-5928 [checkpoints] Use custom metadata file for externalized checkpoints


          Show
          githubbot ASF GitHub Bot added a comment - GitHub user uce opened a pull request: https://github.com/apache/flink/pull/3424 FLINK-5928 [checkpoints] Use custom metadata file for externalized checkpoints Adds a checkpoint coordinator test for externalized checkpoints. This was covered only in unit tests for the involved checkpoint components like PendingCheckpoint etc. This would have caught the issue. The fix is to not use a `_metadata` but a random `checkpoint_metadata-:randomSuffix` file for externalized checkpoints, because they are not unique per configured directory. Hopefully, we can get rid of this soon with @StephanEwen's refactorings. This is based on top of #3411, which is already good to merge. You can merge this pull request into a Git repository by running: $ git pull https://github.com/uce/flink 5928-ext_chk_metadata Alternatively you can review and apply these changes as the patch at: https://github.com/apache/flink/pull/3424.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #3424 commit a3d2405f690e08d3de4f641428887ab04ba2ca2d Author: Stephan Ewen <sewen@apache.org> Date: 2017-02-17T16:51:00Z FLINK-5822 [state backends] Make JobManager / Checkpoint Coordinator aware of the root state backend commit b2f3bc41bc991e8deb22fb89822f28c75d94c8f7 Author: Stephan Ewen <sewen@apache.org> Date: 2017-02-22T21:18:50Z FLINK-5897 [checkpoints] Make checkpoint externalization not depend strictly on FileSystems That is the first step towards checkpoints that can be externalized to other stores as well, like k/v stores and databases, if supported by the state backend. commit 537be203dab0614383645e859bbafb6ebfeb3161 Author: Ufuk Celebi <uce@apache.org> Date: 2017-02-27T15:12:37Z FLINK-5928 [checkpoints] Add CheckpointCoordinatorExternalizedCheckpointsTest Problem: there were only unit tests for the checkpoint instances available that don't test the behaviour of the checkpoint coordinator with respect to externalized checkpoints. commit 88e4700cce630f8ae869abff22acfd46ab999aa0 Author: Ufuk Celebi <uce@apache.org> Date: 2017-02-27T15:58:14Z FLINK-5928 [checkpoints] Use custom metadata file for externalized checkpoints

            People

            • Assignee:
              uce Ufuk Celebi
              Reporter:
              uce Ufuk Celebi
            • Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development