Uploaded image for project: 'Flink'
  1. Flink
  2. FLINK-5699

Cancel with savepoint fails with a NPE if savepoint target directory not set

    Details

      Description

      When canceling a job with savepoint where one has not configured a savepoint directory, then the command fails with the following exception

      java.lang.Exception: Canceling the job with ID 663f9769f0f3565b8ebc2acf0091431a failed.
      	at org.apache.flink.client.CliFrontend.cancel(CliFrontend.java:633)
      	at org.apache.flink.client.CliFrontend.parseParameters(CliFrontend.java:1082)
      	at org.apache.flink.client.CliFrontend$2.call(CliFrontend.java:1123)
      	at org.apache.flink.client.CliFrontend$2.call(CliFrontend.java:1120)
      	at org.apache.flink.runtime.security.HadoopSecurityContext$1.run(HadoopSecurityContext.java:43)
      	at java.security.AccessController.doPrivileged(Native Method)
      	at javax.security.auth.Subject.doAs(Subject.java:422)
      	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548)
      	at org.apache.flink.runtime.security.HadoopSecurityContext.runSecured(HadoopSecurityContext.java:40)
      	at org.apache.flink.client.CliFrontend.main(CliFrontend.java:1120)
      Caused by: java.lang.Exception: Failed to cancel job 663f9769f0f3565b8ebc2acf0091431a with savepoint.
      	at org.apache.flink.runtime.jobmanager.JobManager$$anonfun$handleMessage$1.applyOrElse(JobManager.scala:634)
      	at scala.runtime.AbstractPartialFunction$mcVL$sp.apply$mcVL$sp(AbstractPartialFunction.scala:33)
      	at scala.runtime.AbstractPartialFunction$mcVL$sp.apply(AbstractPartialFunction.scala:33)
      	at scala.runtime.AbstractPartialFunction$mcVL$sp.apply(AbstractPartialFunction.scala:25)
      	at org.apache.flink.runtime.LeaderSessionMessageFilter$$anonfun$receive$1.applyOrElse(LeaderSessionMessageFilter.scala:36)
      	at scala.runtime.AbstractPartialFunction$mcVL$sp.apply$mcVL$sp(AbstractPartialFunction.scala:33)
      	at scala.runtime.AbstractPartialFunction$mcVL$sp.apply(AbstractPartialFunction.scala:33)
      	at scala.runtime.AbstractPartialFunction$mcVL$sp.apply(AbstractPartialFunction.scala:25)
      	at org.apache.flink.runtime.LogMessages$$anon$1.apply(LogMessages.scala:33)
      	at org.apache.flink.runtime.LogMessages$$anon$1.apply(LogMessages.scala:28)
      	at scala.PartialFunction$class.applyOrElse(PartialFunction.scala:118)
      	at org.apache.flink.runtime.LogMessages$$anon$1.applyOrElse(LogMessages.scala:28)
      	at akka.actor.Actor$class.aroundReceive(Actor.scala:467)
      	at org.apache.flink.runtime.jobmanager.JobManager.aroundReceive(JobManager.scala:118)
      	at akka.actor.ActorCell.receiveMessage(ActorCell.scala:516)
      	at akka.actor.ActorCell.invoke(ActorCell.scala:487)
      	at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:238)
      	at akka.dispatch.Mailbox.run(Mailbox.scala:220)
      	at akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:397)
      	at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
      	at scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
      	at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
      	at scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
      Caused by: java.lang.NullPointerException: Savepoint target directory
      	at org.apache.flink.util.Preconditions.checkNotNull(Preconditions.java:75)
      	at org.apache.flink.runtime.checkpoint.CheckpointCoordinator.triggerSavepoint(CheckpointCoordinator.java:296)
      	at org.apache.flink.runtime.jobmanager.JobManager$$anonfun$handleMessage$1.applyOrElse(JobManager.scala:598)
      	... 22 more
      

      I think we could return a more meaningful exception then the NPE to the user.

        Issue Links

          Activity

          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user asfgit closed the pull request at:

          https://github.com/apache/flink/pull/3263

          Show
          githubbot ASF GitHub Bot added a comment - Github user asfgit closed the pull request at: https://github.com/apache/flink/pull/3263
          Hide
          uce Ufuk Celebi added a comment -

          Fixed in b452c8b (master), 1864b4e (release-1.2).

          Show
          uce Ufuk Celebi added a comment - Fixed in b452c8b (master), 1864b4e (release-1.2).
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user uce commented on the issue:

          https://github.com/apache/flink/pull/3263

          Build failures are unrelated in change is self-contained. Merging this to `master` and `release-1.2`.

          Show
          githubbot ASF GitHub Bot added a comment - Github user uce commented on the issue: https://github.com/apache/flink/pull/3263 Build failures are unrelated in change is self-contained. Merging this to `master` and `release-1.2`.
          Hide
          githubbot ASF GitHub Bot added a comment -

          GitHub user uce opened a pull request:

          https://github.com/apache/flink/pull/3263

          FLINK-5699 [savepoints] Check target dir when cancelling with savepoint

          When cancelling a job with a savepoint and no savepoint directory is configured, triggering the savepoint fails with an NPE. This is then returned to the user as the root cause.

          Instead of simply forwarding the argument (which is possibly null), we check it for null and return a IllegalStateException with a meaningful message.

          You can merge this pull request into a Git repository by running:

          $ git pull https://github.com/uce/flink 5699-cancel_with_savepoint_directory

          Alternatively you can review and apply these changes as the patch at:

          https://github.com/apache/flink/pull/3263.patch

          To close this pull request, make a commit to your master/trunk branch
          with (at least) the following in the commit message:

          This closes #3263


          commit 5d06fd1c9b3a9ac1b404618b5bc843596e89e0ba
          Author: Ufuk Celebi <uce@apache.org>
          Date: 2017-02-03T16:28:27Z

          FLINK-5699 [savepoints] Check target dir when cancelling with savepoint

          Problem: when cancelling a job with a savepoint and no savepoint directory
          is configured, triggering the savepoint fails with an NPE. This is then
          returned to the user as the root cause.

          Solution: Instead of simply forwarding the argument (which is possibly
          null), we check it for null and return a IllegalStateException with
          a meaningful message.


          Show
          githubbot ASF GitHub Bot added a comment - GitHub user uce opened a pull request: https://github.com/apache/flink/pull/3263 FLINK-5699 [savepoints] Check target dir when cancelling with savepoint When cancelling a job with a savepoint and no savepoint directory is configured, triggering the savepoint fails with an NPE. This is then returned to the user as the root cause. Instead of simply forwarding the argument (which is possibly null), we check it for null and return a IllegalStateException with a meaningful message. You can merge this pull request into a Git repository by running: $ git pull https://github.com/uce/flink 5699-cancel_with_savepoint_directory Alternatively you can review and apply these changes as the patch at: https://github.com/apache/flink/pull/3263.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #3263 commit 5d06fd1c9b3a9ac1b404618b5bc843596e89e0ba Author: Ufuk Celebi <uce@apache.org> Date: 2017-02-03T16:28:27Z FLINK-5699 [savepoints] Check target dir when cancelling with savepoint Problem: when cancelling a job with a savepoint and no savepoint directory is configured, triggering the savepoint fails with an NPE. This is then returned to the user as the root cause. Solution: Instead of simply forwarding the argument (which is possibly null), we check it for null and return a IllegalStateException with a meaningful message.

            People

            • Assignee:
              uce Ufuk Celebi
              Reporter:
              till.rohrmann Till Rohrmann
            • Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development