Uploaded image for project: 'Flink'
  1. Flink
  2. FLINK-6830

Add ITTests for savepoint migration from 1.3

    Details

      Description

      Already with FLINK-6763 and FLINK-6764 we'll need to change the serialization formats between 1.3.0 and 1.3.x.

      We probably should add the stateful job migration ITCases for restoring from Flink 1.3.x now.

        Issue Links

          Activity

          Hide
          githubbot ASF GitHub Bot added a comment -

          GitHub user tzulitai opened a pull request:

          https://github.com/apache/flink/pull/4059

          FLINK-6830 Add StatefulJobSavepointFrom13MigrationITCase

          With FLINK-6763(https://issues.apache.org/jira/browse/FLINK-6763) and FLINK-6764(https://issues.apache.org/jira/browse/FLINK-6764) we'll need to change the serialization formats between 1.3.0 and 1.3.x.

          This PR adds the ITCase for the savepoint migration so that those format changes will be properly guarded for backwards compatibility. The savepoint binary files of this PR were taken using the current `release-1.3` branch.

          You can merge this pull request into a Git repository by running:

          $ git pull https://github.com/tzulitai/flink FLINK-6830

          Alternatively you can review and apply these changes as the patch at:

          https://github.com/apache/flink/pull/4059.patch

          To close this pull request, make a commit to your master/trunk branch
          with (at least) the following in the commit message:

          This closes #4059


          commit fb2b1ddbe52b160a8c45052ca6abd9c0f5477d7d
          Author: Tzu-Li (Gordon) Tai <tzulitai@apache.org>
          Date: 2017-06-02T15:18:51Z

          FLINK-6830 Add StatefulJobSavepointFrom13MigrationITCase


          Show
          githubbot ASF GitHub Bot added a comment - GitHub user tzulitai opened a pull request: https://github.com/apache/flink/pull/4059 FLINK-6830 Add StatefulJobSavepointFrom13MigrationITCase With FLINK-6763 ( https://issues.apache.org/jira/browse/FLINK-6763 ) and FLINK-6764 ( https://issues.apache.org/jira/browse/FLINK-6764 ) we'll need to change the serialization formats between 1.3.0 and 1.3.x. This PR adds the ITCase for the savepoint migration so that those format changes will be properly guarded for backwards compatibility. The savepoint binary files of this PR were taken using the current `release-1.3` branch. You can merge this pull request into a Git repository by running: $ git pull https://github.com/tzulitai/flink FLINK-6830 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/flink/pull/4059.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #4059 commit fb2b1ddbe52b160a8c45052ca6abd9c0f5477d7d Author: Tzu-Li (Gordon) Tai <tzulitai@apache.org> Date: 2017-06-02T15:18:51Z FLINK-6830 Add StatefulJobSavepointFrom13MigrationITCase
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user zentol commented on the issue:

          https://github.com/apache/flink/pull/4059

          Can we also include the tests for the restore with topology changes in flink-tests?

          org.apache.flink.test.start.operator.restore *

          Show
          githubbot ASF GitHub Bot added a comment - Github user zentol commented on the issue: https://github.com/apache/flink/pull/4059 Can we also include the tests for the restore with topology changes in flink-tests? org.apache.flink.test.start.operator.restore *
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user aljoscha commented on the issue:

          https://github.com/apache/flink/pull/4059

          The changes look good! @zentol's suggestion also seems wise. 👌

          Show
          githubbot ASF GitHub Bot added a comment - Github user aljoscha commented on the issue: https://github.com/apache/flink/pull/4059 The changes look good! @zentol's suggestion also seems wise. 👌
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user tzulitai commented on the issue:

          https://github.com/apache/flink/pull/4059

          Thanks for the reviews @zentol @aljoscha! I'll add the topology change ITTests also and then merge this.

          Show
          githubbot ASF GitHub Bot added a comment - Github user tzulitai commented on the issue: https://github.com/apache/flink/pull/4059 Thanks for the reviews @zentol @aljoscha! I'll add the topology change ITTests also and then merge this.
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user aljoscha commented on the issue:

          https://github.com/apache/flink/pull/4059

          @tzulitai While you're on it, what would you think about also porting the `*From12MigrationTest` tests?

          Show
          githubbot ASF GitHub Bot added a comment - Github user aljoscha commented on the issue: https://github.com/apache/flink/pull/4059 @tzulitai While you're on it, what would you think about also porting the `*From12MigrationTest` tests?
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user tzulitai commented on the issue:

          https://github.com/apache/flink/pull/4059

          @aljoscha +1 to that, I'm actually working on those also already :-D

          Show
          githubbot ASF GitHub Bot added a comment - Github user tzulitai commented on the issue: https://github.com/apache/flink/pull/4059 @aljoscha +1 to that, I'm actually working on those also already :-D
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user tzulitai commented on the issue:

          https://github.com/apache/flink/pull/4059

          f228081 is for porting the topology change on restore migration tests.

          The rest 233c617 to e619b98 is for ports of each individual migration test with the pattern `*From12MigrationTest`. They're also refactored to deduplicate code across 1.1, 1.2, 1.3 migration tests.

          Note that for the CEP migration tests in 233c617, some tests for migration from 1.3 is currently ignored because they do not pass. May have bumped into a bug for the CEP library that does not allow it to restore from 1.3.x to 1.3.x 😅

          Show
          githubbot ASF GitHub Bot added a comment - Github user tzulitai commented on the issue: https://github.com/apache/flink/pull/4059 f228081 is for porting the topology change on restore migration tests. The rest 233c617 to e619b98 is for ports of each individual migration test with the pattern `*From12MigrationTest`. They're also refactored to deduplicate code across 1.1, 1.2, 1.3 migration tests. Note that for the CEP migration tests in 233c617, some tests for migration from 1.3 is currently ignored because they do not pass. May have bumped into a bug for the CEP library that does not allow it to restore from 1.3.x to 1.3.x 😅
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user zentol commented on a diff in the pull request:

          https://github.com/apache/flink/pull/4059#discussion_r120303440

          — Diff: flink-tests/src/test/java/org/apache/flink/test/state/operator/restore/unkeyed/ChainBreakTest.java —
          @@ -31,8 +36,22 @@
          /**

          • Verifies that the state of all operators is restored if a topology change breaks up a chain.
            */
            +@RunWith(Parameterized.class)
            public class ChainBreakTest extends AbstractNonKeyedOperatorRestoreTestBase {

          + private final String savepointPath;
          +
          + @Parameterized.Parameters(name = "Migrate Savepoint:

          {0}

          ")
          + public static Collection<String> parameters () {
          — End diff –

          is it not possible to move this into the abstract super class?

          Show
          githubbot ASF GitHub Bot added a comment - Github user zentol commented on a diff in the pull request: https://github.com/apache/flink/pull/4059#discussion_r120303440 — Diff: flink-tests/src/test/java/org/apache/flink/test/state/operator/restore/unkeyed/ChainBreakTest.java — @@ -31,8 +36,22 @@ /** Verifies that the state of all operators is restored if a topology change breaks up a chain. */ +@RunWith(Parameterized.class) public class ChainBreakTest extends AbstractNonKeyedOperatorRestoreTestBase { + private final String savepointPath; + + @Parameterized.Parameters(name = "Migrate Savepoint: {0} ") + public static Collection<String> parameters () { — End diff – is it not possible to move this into the abstract super class?
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user zentol commented on the issue:

          https://github.com/apache/flink/pull/4059

          topology tests look good.

          Show
          githubbot ASF GitHub Bot added a comment - Github user zentol commented on the issue: https://github.com/apache/flink/pull/4059 topology tests look good.
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user tzulitai commented on a diff in the pull request:

          https://github.com/apache/flink/pull/4059#discussion_r120305797

          — Diff: flink-tests/src/test/java/org/apache/flink/test/state/operator/restore/unkeyed/ChainBreakTest.java —
          @@ -31,8 +36,22 @@
          /**

          • Verifies that the state of all operators is restored if a topology change breaks up a chain.
            */
            +@RunWith(Parameterized.class)
            public class ChainBreakTest extends AbstractNonKeyedOperatorRestoreTestBase {

          + private final String savepointPath;
          +
          + @Parameterized.Parameters(name = "Migrate Savepoint:

          {0}

          ")
          + public static Collection<String> parameters () {
          — End diff –

          Will change!

          Show
          githubbot ASF GitHub Bot added a comment - Github user tzulitai commented on a diff in the pull request: https://github.com/apache/flink/pull/4059#discussion_r120305797 — Diff: flink-tests/src/test/java/org/apache/flink/test/state/operator/restore/unkeyed/ChainBreakTest.java — @@ -31,8 +36,22 @@ /** Verifies that the state of all operators is restored if a topology change breaks up a chain. */ +@RunWith(Parameterized.class) public class ChainBreakTest extends AbstractNonKeyedOperatorRestoreTestBase { + private final String savepointPath; + + @Parameterized.Parameters(name = "Migrate Savepoint: {0} ") + public static Collection<String> parameters () { — End diff – Will change!
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user aljoscha commented on the issue:

          https://github.com/apache/flink/pull/4059

          Why are the CEP "from12" snapshots also changed?

          Show
          githubbot ASF GitHub Bot added a comment - Github user aljoscha commented on the issue: https://github.com/apache/flink/pull/4059 Why are the CEP "from12" snapshots also changed?
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user aljoscha commented on the issue:

          https://github.com/apache/flink/pull/4059

          @kl0u Could you please look into the CEP tests that failing to restore from 1.3 to 1.3?

          Show
          githubbot ASF GitHub Bot added a comment - Github user aljoscha commented on the issue: https://github.com/apache/flink/pull/4059 @kl0u Could you please look into the CEP tests that failing to restore from 1.3 to 1.3?
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user kl0u commented on the issue:

          https://github.com/apache/flink/pull/4059

          @aljoscha yes we have already discussed offline with @tzulitai .

          Show
          githubbot ASF GitHub Bot added a comment - Github user kl0u commented on the issue: https://github.com/apache/flink/pull/4059 @aljoscha yes we have already discussed offline with @tzulitai .
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user aljoscha commented on a diff in the pull request:

          https://github.com/apache/flink/pull/4059#discussion_r120317328

          — Diff: flink-streaming-java/src/test/java/org/apache/flink/streaming/runtime/operators/windowing/WindowOperatorMigrationTest.java —
          @@ -149,9 +169,16 @@ public void testRestoreSessionWindowsWithCountTrigger() throws Exception {
          new KeyedOneInputStreamOperatorTestHarness<>(operator, new TupleKeySelector(), BasicTypeInfo.STRING_TYPE_INFO);

          testHarness.setup();

          • testHarness.initializeState(
          • OperatorSnapshotUtil.readStateHandle(
          • OperatorSnapshotUtil.getResourceFilename("win-op-migration-test-session-with-stateful-trigger-flink1.2-snapshot")));
            +
            + String savepointFile = "win-op-migration-test-session-with-stateful-trigger-flink" + testMigrateVersion + "-snapshot";
            + if (testMigrateVersion.equals("1.1")) {
              • End diff –

          I think you could refactor this piece of code into a method, something like:
          ```
          restoreFromSnapshot(TestHarness, SnapshotPath, MigrationVersion)
          ```

          What do you think?

          Show
          githubbot ASF GitHub Bot added a comment - Github user aljoscha commented on a diff in the pull request: https://github.com/apache/flink/pull/4059#discussion_r120317328 — Diff: flink-streaming-java/src/test/java/org/apache/flink/streaming/runtime/operators/windowing/WindowOperatorMigrationTest.java — @@ -149,9 +169,16 @@ public void testRestoreSessionWindowsWithCountTrigger() throws Exception { new KeyedOneInputStreamOperatorTestHarness<>(operator, new TupleKeySelector(), BasicTypeInfo.STRING_TYPE_INFO); testHarness.setup(); testHarness.initializeState( OperatorSnapshotUtil.readStateHandle( OperatorSnapshotUtil.getResourceFilename("win-op-migration-test-session-with-stateful-trigger-flink1.2-snapshot"))); + + String savepointFile = "win-op-migration-test-session-with-stateful-trigger-flink" + testMigrateVersion + "-snapshot"; + if (testMigrateVersion.equals("1.1")) { End diff – I think you could refactor this piece of code into a method, something like: ``` restoreFromSnapshot(TestHarness, SnapshotPath, MigrationVersion) ``` What do you think?
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user aljoscha commented on the issue:

          https://github.com/apache/flink/pull/4059

          I like these changes a lot! Especially the refactoring of using the same test for all versions. (I'm sure @zentol will also like that 😉 )

          Had some inline comments.

          Show
          githubbot ASF GitHub Bot added a comment - Github user aljoscha commented on the issue: https://github.com/apache/flink/pull/4059 I like these changes a lot! Especially the refactoring of using the same test for all versions. (I'm sure @zentol will also like that 😉 ) Had some inline comments.
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user kl0u commented on the issue:

          https://github.com/apache/flink/pull/4059

          Hi @tzulitai ! Thanks for testing also this.

          The problem with the `CEP` failing test for `1.3` while passing for `1.2` is that the semantics of the `followedBy()` changed between `1.2` and `1.3`, so the expected results cannot be the same.

          In `1.2`, `followedBy()` implied non-deterministic relaxed contiguity while in `1.3` it is just relaxed contiguity. So for `1.3`, either change the expected result to contain only one match with the `foo1` as middle element, or change the pattern to have `followedByAny(middle)` (and in this case you have to take the savepoint again).

          Show
          githubbot ASF GitHub Bot added a comment - Github user kl0u commented on the issue: https://github.com/apache/flink/pull/4059 Hi @tzulitai ! Thanks for testing also this. The problem with the `CEP` failing test for `1.3` while passing for `1.2` is that the semantics of the `followedBy()` changed between `1.2` and `1.3`, so the expected results cannot be the same. In `1.2`, `followedBy()` implied non-deterministic relaxed contiguity while in `1.3` it is just relaxed contiguity. So for `1.3`, either change the expected result to contain only one match with the `foo1` as middle element, or change the pattern to have `followedByAny(middle)` (and in this case you have to take the savepoint again).
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user tzulitai commented on a diff in the pull request:

          https://github.com/apache/flink/pull/4059#discussion_r120365906

          — Diff: flink-streaming-java/src/test/java/org/apache/flink/streaming/runtime/operators/windowing/WindowOperatorMigrationTest.java —
          @@ -149,9 +169,16 @@ public void testRestoreSessionWindowsWithCountTrigger() throws Exception {
          new KeyedOneInputStreamOperatorTestHarness<>(operator, new TupleKeySelector(), BasicTypeInfo.STRING_TYPE_INFO);

          testHarness.setup();

          • testHarness.initializeState(
          • OperatorSnapshotUtil.readStateHandle(
          • OperatorSnapshotUtil.getResourceFilename("win-op-migration-test-session-with-stateful-trigger-flink1.2-snapshot")));
            +
            + String savepointFile = "win-op-migration-test-session-with-stateful-trigger-flink" + testMigrateVersion + "-snapshot";
            + if (testMigrateVersion.equals("1.1")) {
              • End diff –

          That makes sense! I'll add the method to some `MigrationTestUtil` perhaps.

          Show
          githubbot ASF GitHub Bot added a comment - Github user tzulitai commented on a diff in the pull request: https://github.com/apache/flink/pull/4059#discussion_r120365906 — Diff: flink-streaming-java/src/test/java/org/apache/flink/streaming/runtime/operators/windowing/WindowOperatorMigrationTest.java — @@ -149,9 +169,16 @@ public void testRestoreSessionWindowsWithCountTrigger() throws Exception { new KeyedOneInputStreamOperatorTestHarness<>(operator, new TupleKeySelector(), BasicTypeInfo.STRING_TYPE_INFO); testHarness.setup(); testHarness.initializeState( OperatorSnapshotUtil.readStateHandle( OperatorSnapshotUtil.getResourceFilename("win-op-migration-test-session-with-stateful-trigger-flink1.2-snapshot"))); + + String savepointFile = "win-op-migration-test-session-with-stateful-trigger-flink" + testMigrateVersion + "-snapshot"; + if (testMigrateVersion.equals("1.1")) { End diff – That makes sense! I'll add the method to some `MigrationTestUtil` perhaps.
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user tzulitai commented on the issue:

          https://github.com/apache/flink/pull/4059

          Thanks @kl0u, your suggestions fixes the problem. Would it also be possible to include document the contiguity semantic changes in #4041?

          Show
          githubbot ASF GitHub Bot added a comment - Github user tzulitai commented on the issue: https://github.com/apache/flink/pull/4059 Thanks @kl0u, your suggestions fixes the problem. Would it also be possible to include document the contiguity semantic changes in #4041?
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user kl0u commented on the issue:

          https://github.com/apache/flink/pull/4059

          @tzulitai yes I will

          Show
          githubbot ASF GitHub Bot added a comment - Github user kl0u commented on the issue: https://github.com/apache/flink/pull/4059 @tzulitai yes I will
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user tzulitai commented on the issue:

          https://github.com/apache/flink/pull/4059

          Addressed @aljoscha @kl0u @zentol's comments.

          Also rebased on `master` for another Travis run. If green, will merge by the end of today to `master` and `release-1.3` if there's no other objections!

          Show
          githubbot ASF GitHub Bot added a comment - Github user tzulitai commented on the issue: https://github.com/apache/flink/pull/4059 Addressed @aljoscha @kl0u @zentol's comments. Also rebased on `master` for another Travis run. If green, will merge by the end of today to `master` and `release-1.3` if there's no other objections!
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user tzulitai commented on the issue:

          https://github.com/apache/flink/pull/4059

          Merging ...

          Show
          githubbot ASF GitHub Bot added a comment - Github user tzulitai commented on the issue: https://github.com/apache/flink/pull/4059 Merging ...
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user asfgit closed the pull request at:

          https://github.com/apache/flink/pull/4059

          Show
          githubbot ASF GitHub Bot added a comment - Github user asfgit closed the pull request at: https://github.com/apache/flink/pull/4059
          Hide
          tzulitai Tzu-Li (Gordon) Tai added a comment -

          Fixed for master via 3792be4b5f80826d5dbf51c0517e8c00847472ca.
          Fixed for 1.3.1 via d4a646a035366918a100f64428c471464870b8d0.

          Show
          tzulitai Tzu-Li (Gordon) Tai added a comment - Fixed for master via 3792be4b5f80826d5dbf51c0517e8c00847472ca. Fixed for 1.3.1 via d4a646a035366918a100f64428c471464870b8d0.

            People

            • Assignee:
              tzulitai Tzu-Li (Gordon) Tai
              Reporter:
              tzulitai Tzu-Li (Gordon) Tai
            • Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development