Uploaded image for project: 'Flink'
  1. Flink
  2. FLINK-5628

CheckpointStatsTracker implements Serializable but isn't

    Details

    • Type: Bug
    • Status: Closed
    • Priority: Blocker
    • Resolution: Fixed
    • Affects Version/s: 1.2.0, 1.3.0
    • Fix Version/s: 1.3.0, 1.2.1
    • Component/s: Local Runtime
    • Labels:
      None

      Description

      The CheckpointStatsTracker implements the Serializable interface, even though it no longer is serializable as it contains a List<ExecutionJobVertex>.

      This was introduced in 579bc96446d598a2cfe8237b4ebd62d8c9df3483 which reworked the checkpoint stats tracking,

      This does not affect 1.2 or 1.3 in any way (since these objects aren't serialized there), but it blocks the implementation of the HistoryServer.

      The AccessExecution*/ArchivedExecution* classes, which are supposed to be a serializable form of the ExecutionGraph, are thus broken as they also make use of the CheckpointStatsTracker.

      (Note: * = Graph/ExecutionJobVertex/ExecutionVertex/Execution) .

      This wasn't catched in tests since no ExecutionJobVertices were given to the CheckpointStatsTracker in ArchivedExecutionGraphTest#setExecutionGraph.

        Issue Links

          Activity

          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user asfgit closed the pull request at:

          https://github.com/apache/flink/pull/3215

          Show
          githubbot ASF GitHub Bot added a comment - Github user asfgit closed the pull request at: https://github.com/apache/flink/pull/3215
          Hide
          uce Ufuk Celebi added a comment -

          3e2e49f (release-1.2), dcfa3fb (master).

          Show
          uce Ufuk Celebi added a comment - 3e2e49f (release-1.2), dcfa3fb (master).
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user uce commented on the issue:

          https://github.com/apache/flink/pull/3215

          Thanks for review. I'm going to merge this to `release-1.2` as well.

          Show
          githubbot ASF GitHub Bot added a comment - Github user uce commented on the issue: https://github.com/apache/flink/pull/3215 Thanks for review. I'm going to merge this to `release-1.2` as well.
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user zentol commented on the issue:

          https://github.com/apache/flink/pull/3215

          @uce +1 to merge.

          Show
          githubbot ASF GitHub Bot added a comment - Github user zentol commented on the issue: https://github.com/apache/flink/pull/3215 @uce +1 to merge.
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user uce commented on the issue:

          https://github.com/apache/flink/pull/3215

          Thanks for spotting these. I'll adress your points.

          Show
          githubbot ASF GitHub Bot added a comment - Github user uce commented on the issue: https://github.com/apache/flink/pull/3215 Thanks for spotting these. I'll adress your points.
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user zentol commented on the issue:

          https://github.com/apache/flink/pull/3215

          add `SubtaskStateStats` to the list.

          Show
          githubbot ASF GitHub Bot added a comment - Github user zentol commented on the issue: https://github.com/apache/flink/pull/3215 add `SubtaskStateStats` to the list.
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user zentol commented on the issue:

          https://github.com/apache/flink/pull/3215

          as well as the and the `AbstractCheckpointStats` class.

          Show
          githubbot ASF GitHub Bot added a comment - Github user zentol commented on the issue: https://github.com/apache/flink/pull/3215 as well as the and the `AbstractCheckpointStats` class.
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user zentol commented on the issue:

          https://github.com/apache/flink/pull/3215

          I think the `CompletedCheckpointStats` class must also implement `Serializable`.

          Show
          githubbot ASF GitHub Bot added a comment - Github user zentol commented on the issue: https://github.com/apache/flink/pull/3215 I think the `CompletedCheckpointStats` class must also implement `Serializable`.
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user zentol commented on a diff in the pull request:

          https://github.com/apache/flink/pull/3215#discussion_r97799282

          — Diff: flink-runtime/src/main/java/org/apache/flink/runtime/checkpoint/CheckpointStatsTracker.java —
          @@ -85,6 +79,9 @@
          /** History of checkpoints. */
          private final CheckpointStatsHistory history;

          + /** The job vertices taking part in the checkpoints. */
          + private final transient List<ExecutionJobVertex> jobVertices;
          — End diff –

          doesn't have to be transient anymore.

          Show
          githubbot ASF GitHub Bot added a comment - Github user zentol commented on a diff in the pull request: https://github.com/apache/flink/pull/3215#discussion_r97799282 — Diff: flink-runtime/src/main/java/org/apache/flink/runtime/checkpoint/CheckpointStatsTracker.java — @@ -85,6 +79,9 @@ /** History of checkpoints. */ private final CheckpointStatsHistory history; + /** The job vertices taking part in the checkpoints. */ + private final transient List<ExecutionJobVertex> jobVertices; — End diff – doesn't have to be transient anymore.
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user zentol commented on a diff in the pull request:

          https://github.com/apache/flink/pull/3215#discussion_r97799769

          — Diff: flink-runtime/src/main/java/org/apache/flink/runtime/executiongraph/ArchivedExecutionGraph.java —
          @@ -105,10 +110,12 @@ public ArchivedExecutionGraph(
          this.serializedUserAccumulators = serializedUserAccumulators;
          this.archivedExecutionConfig = executionConfig;
          this.isStoppable = isStoppable;

          • this.tracker = tracker;
            + this.jobSnapshottingSettings = checkNotNull(jobSnapshottingSettings);
            + this.checkpointStatsSnapshot = checkNotNull(checkpointStatsSnapshot);
              • End diff –

          so these are not null even when checkpointing is disabled?

          Show
          githubbot ASF GitHub Bot added a comment - Github user zentol commented on a diff in the pull request: https://github.com/apache/flink/pull/3215#discussion_r97799769 — Diff: flink-runtime/src/main/java/org/apache/flink/runtime/executiongraph/ArchivedExecutionGraph.java — @@ -105,10 +110,12 @@ public ArchivedExecutionGraph( this.serializedUserAccumulators = serializedUserAccumulators; this.archivedExecutionConfig = executionConfig; this.isStoppable = isStoppable; this.tracker = tracker; + this.jobSnapshottingSettings = checkNotNull(jobSnapshottingSettings); + this.checkpointStatsSnapshot = checkNotNull(checkpointStatsSnapshot); End diff – so these are not null even when checkpointing is disabled?
          Hide
          githubbot ASF GitHub Bot added a comment -

          GitHub user uce opened a pull request:

          https://github.com/apache/flink/pull/3215

          FLINK-5628 [webfrontend] Fix serialization of stats tracker

          Instead of exposing the stats tracker in `AccessExecutionGraph`, we expose the checkpoint stats snapshot. That way, we don't have to worry about serializibility in the tracker itself. The archived execution graph simply gets the latest snapshot before archival. With this change, we also prevent illegal use of the tracker after it has been archived.

          Not strictly necessary for 1.2 but nice to have in order to have an easier time when backporting stuff from 1.3 later.

          You can merge this pull request into a Git repository by running:

          $ git pull https://github.com/uce/flink serialize

          Alternatively you can review and apply these changes as the patch at:

          https://github.com/apache/flink/pull/3215.patch

          To close this pull request, make a commit to your master/trunk branch
          with (at least) the following in the commit message:

          This closes #3215


          commit 4edc2169d70a90cc8e50462a684c5593ae0b0eb3
          Author: Ufuk Celebi <uce@apache.org>
          Date: 2017-01-25T14:42:24Z

          FLINK-5628 [webfrontend] Return snapshot instead of tracker

          commit fc99a5ed096848c9794e8be9928ac5ad3f8027f0
          Author: Ufuk Celebi <uce@apache.org>
          Date: 2017-01-25T14:43:18Z

          FLINK-5628 [webfrontend] Fix serializability of checkpoint stats tracker


          Show
          githubbot ASF GitHub Bot added a comment - GitHub user uce opened a pull request: https://github.com/apache/flink/pull/3215 FLINK-5628 [webfrontend] Fix serialization of stats tracker Instead of exposing the stats tracker in `AccessExecutionGraph`, we expose the checkpoint stats snapshot. That way, we don't have to worry about serializibility in the tracker itself. The archived execution graph simply gets the latest snapshot before archival. With this change, we also prevent illegal use of the tracker after it has been archived. Not strictly necessary for 1.2 but nice to have in order to have an easier time when backporting stuff from 1.3 later. You can merge this pull request into a Git repository by running: $ git pull https://github.com/uce/flink serialize Alternatively you can review and apply these changes as the patch at: https://github.com/apache/flink/pull/3215.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #3215 commit 4edc2169d70a90cc8e50462a684c5593ae0b0eb3 Author: Ufuk Celebi <uce@apache.org> Date: 2017-01-25T14:42:24Z FLINK-5628 [webfrontend] Return snapshot instead of tracker commit fc99a5ed096848c9794e8be9928ac5ad3f8027f0 Author: Ufuk Celebi <uce@apache.org> Date: 2017-01-25T14:43:18Z FLINK-5628 [webfrontend] Fix serializability of checkpoint stats tracker
          Hide
          Zentol Chesnay Schepler added a comment -

          A simple solution to this would be to store the CheckpintStatsSnapshot in the archived classes instead.

          Show
          Zentol Chesnay Schepler added a comment - A simple solution to this would be to store the CheckpintStatsSnapshot in the archived classes instead.

            People

            • Assignee:
              uce Ufuk Celebi
              Reporter:
              Zentol Chesnay Schepler
            • Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development