Uploaded image for project: 'Flink'
  1. Flink
  2. FLINK-6565

Improve error messages for state restore failures

    Details

      Description

      The error messages thrown when state restore fails needs to be more explicit and clear of the actual reason.

      At least 2 cases we've seen so far:

      1.
      For example, currently, when restoring an operator state or memory-backed keyed state, the previous serializer must exist. If it doesn't exist, currently only a vague NPE is thrown, without a clear message of the actual reason.

      2.
      If the restore failure was due to an incompatible version of a serializer's config snapshot, then it should report something more informative then: "Incompatible version: found 1, required 1."

        Issue Links

          Activity

          Hide
          githubbot ASF GitHub Bot added a comment -

          GitHub user tzulitai opened a pull request:

          https://github.com/apache/flink/pull/3882

          FLINK-6565 Fail memory-backed state restores with meaningful message if previous serializer is unavailable

          Currently, without eager state registration, if on restore of memory-backed states (`DefaultOperatorStateBackend`, `HeapKeyedStateBackend`) the previous state serializer cannot be loaded (perhaps implementation changed or it was simply removed from classpath), we could only fail the job because there is no serializer to read previous state.

          Prior to this PR, the job was failing correctly, but without a meaningful message (only an NPE).
          This PR adds a more meaningful message to the failure. It also adds tests for the memory-backed backends that the failure is as expected.

          You can merge this pull request into a Git repository by running:

          $ git pull https://github.com/tzulitai/flink FLINK-6565

          Alternatively you can review and apply these changes as the patch at:

          https://github.com/apache/flink/pull/3882.patch

          To close this pull request, make a commit to your master/trunk branch
          with (at least) the following in the commit message:

          This closes #3882


          commit 8f032dae560713af7d5d77dd4b85fb367332fa63
          Author: Tzu-Li (Gordon) Tai <tzulitai@apache.org>
          Date: 2017-05-12T11:11:25Z

          FLINK-6565 Fail memory-backed state restores with meaningful message if previous serializer is unavailable


          Show
          githubbot ASF GitHub Bot added a comment - GitHub user tzulitai opened a pull request: https://github.com/apache/flink/pull/3882 FLINK-6565 Fail memory-backed state restores with meaningful message if previous serializer is unavailable Currently, without eager state registration, if on restore of memory-backed states (`DefaultOperatorStateBackend`, `HeapKeyedStateBackend`) the previous state serializer cannot be loaded (perhaps implementation changed or it was simply removed from classpath), we could only fail the job because there is no serializer to read previous state. Prior to this PR, the job was failing correctly, but without a meaningful message (only an NPE). This PR adds a more meaningful message to the failure. It also adds tests for the memory-backed backends that the failure is as expected. You can merge this pull request into a Git repository by running: $ git pull https://github.com/tzulitai/flink FLINK-6565 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/flink/pull/3882.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #3882 commit 8f032dae560713af7d5d77dd4b85fb367332fa63 Author: Tzu-Li (Gordon) Tai <tzulitai@apache.org> Date: 2017-05-12T11:11:25Z FLINK-6565 Fail memory-backed state restores with meaningful message if previous serializer is unavailable
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user StefanRRichter commented on the issue:

          https://github.com/apache/flink/pull/3882

          LGTM +1

          Show
          githubbot ASF GitHub Bot added a comment - Github user StefanRRichter commented on the issue: https://github.com/apache/flink/pull/3882 LGTM +1
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user tzulitai commented on the issue:

          https://github.com/apache/flink/pull/3882

          Thanks for the review! Merging to 1.3 / master ..

          Show
          githubbot ASF GitHub Bot added a comment - Github user tzulitai commented on the issue: https://github.com/apache/flink/pull/3882 Thanks for the review! Merging to 1.3 / master ..
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user asfgit closed the pull request at:

          https://github.com/apache/flink/pull/3882

          Show
          githubbot ASF GitHub Bot added a comment - Github user asfgit closed the pull request at: https://github.com/apache/flink/pull/3882
          Hide
          tzulitai Tzu-Li (Gordon) Tai added a comment -

          Fixed for master via c594af09767e2ef1e74dd8db187985460761b724.
          Fixed for 1.3 via 7de221224ebf179581228ae2db7bd685468189da.

          Show
          tzulitai Tzu-Li (Gordon) Tai added a comment - Fixed for master via c594af09767e2ef1e74dd8db187985460761b724. Fixed for 1.3 via 7de221224ebf179581228ae2db7bd685468189da.

            People

            • Assignee:
              tzulitai Tzu-Li (Gordon) Tai
              Reporter:
              tzulitai Tzu-Li (Gordon) Tai
            • Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development