Uploaded image for project: 'Flink'
  1. Flink
  2. FLINK-10482

java.lang.IllegalArgumentException: Negative number of in progress checkpoints

    XMLWordPrintableJSON

    Details

      Description

      Recently I found the following log on my JobManager log:

      2018-10-02 17:44:50,090 [flink-akka.actor.default-dispatcher-4117] ERROR org.apache.flink.runtime.rest.handler.job.JobDetailsHandler  - Implementation error: Unhandled exception.
       java.lang.IllegalArgumentException: Negative number of in progress checkpoints
               at org.apache.flink.util.Preconditions.checkArgument(Preconditions.java:139)
               at org.apache.flink.runtime.checkpoint.CheckpointStatsCounts.<init>(CheckpointStatsCounts.java:72)
               at org.apache.flink.runtime.checkpoint.CheckpointStatsCounts.createSnapshot(CheckpointStatsCounts.java:177)
               at org.apache.flink.runtime.checkpoint.CheckpointStatsTracker.createSnapshot(CheckpointStatsTracker.java:166)
               at org.apache.flink.runtime.executiongraph.ExecutionGraph.getCheckpointStatsSnapshot(ExecutionGraph.java:553)
               at org.apache.flink.runtime.executiongraph.ArchivedExecutionGraph.createFrom(ArchivedExecutionGraph.java:340)
               at org.apache.flink.runtime.jobmaster.JobMaster.requestJob(JobMaster.java:923)
               at sun.reflect.GeneratedMethodAccessor101.invoke(Unknown Source)                   
               at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
               at java.lang.reflect.Method.invoke(Method.java:498)                                 
               at org.apache.flink.runtime.rpc.akka.AkkaRpcActor.handleRpcInvocation(AkkaRpcActor.java:247)                                                                                          
               at org.apache.flink.runtime.rpc.akka.AkkaRpcActor.handleRpcMessage(AkkaRpcActor.java:162)
               at org.apache.flink.runtime.rpc.akka.FencedAkkaRpcActor.handleRpcMessage(FencedAkkaRpcActor.java:70)
               at org.apache.flink.runtime.rpc.akka.AkkaRpcActor.onReceive(AkkaRpcActor.java:142)                                                                                                    
               at org.apache.flink.runtime.rpc.akka.FencedAkkaRpcActor.onReceive(FencedAkkaRpcActor.java:40)                                                                                         
               at akka.actor.UntypedActor$$anonfun$receive$1.applyOrElse(UntypedActor.scala:165)                                                                                                     
               at akka.actor.Actor$class.aroundReceive(Actor.scala:502)                                                                                                                              
               at akka.actor.UntypedActor.aroundReceive(UntypedActor.scala:95)                                                                                                                       
               at akka.actor.ActorCell.receiveMessage(ActorCell.scala:526)                   
               at akka.actor.ActorCell.invoke(ActorCell.scala:495)             
               at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:257)                                                                                                                            
               at akka.dispatch.Mailbox.run(Mailbox.scala:224)    
               at akka.dispatch.Mailbox.exec(Mailbox.scala:234)                                                                                                                                      
               at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)                                                                                                               
               at scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)                                                                                                   
               at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)       
               at scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
      

      Related: The job details don't appear, the screen shows only the skeleton, but no information (like the pipeline, substasks, etc).

      One thing that may have caused this is that the job was failing – an uncaught exception on our code – and, during one of its restarts, I issued a "flink cancel <jobid>". The job was cancelled, but the JobManager interface took a very long time to put the slots as available again.

        Attachments

          Issue Links

            Activity

              People

              • Assignee:
                azagrebin Andrey Zagrebin
                Reporter:
                JBiason Julio Biason
              • Votes:
                0 Vote for this issue
                Watchers:
                9 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: