Uploaded image for project: 'Flink'
  1. Flink
  2. FLINK-10482

java.lang.IllegalArgumentException: Negative number of in progress checkpoints

    XMLWordPrintableJSON

Details

    Description

      Recently I found the following log on my JobManager log:

      2018-10-02 17:44:50,090 [flink-akka.actor.default-dispatcher-4117] ERROR org.apache.flink.runtime.rest.handler.job.JobDetailsHandler  - Implementation error: Unhandled exception.
       java.lang.IllegalArgumentException: Negative number of in progress checkpoints
               at org.apache.flink.util.Preconditions.checkArgument(Preconditions.java:139)
               at org.apache.flink.runtime.checkpoint.CheckpointStatsCounts.<init>(CheckpointStatsCounts.java:72)
               at org.apache.flink.runtime.checkpoint.CheckpointStatsCounts.createSnapshot(CheckpointStatsCounts.java:177)
               at org.apache.flink.runtime.checkpoint.CheckpointStatsTracker.createSnapshot(CheckpointStatsTracker.java:166)
               at org.apache.flink.runtime.executiongraph.ExecutionGraph.getCheckpointStatsSnapshot(ExecutionGraph.java:553)
               at org.apache.flink.runtime.executiongraph.ArchivedExecutionGraph.createFrom(ArchivedExecutionGraph.java:340)
               at org.apache.flink.runtime.jobmaster.JobMaster.requestJob(JobMaster.java:923)
               at sun.reflect.GeneratedMethodAccessor101.invoke(Unknown Source)                   
               at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
               at java.lang.reflect.Method.invoke(Method.java:498)                                 
               at org.apache.flink.runtime.rpc.akka.AkkaRpcActor.handleRpcInvocation(AkkaRpcActor.java:247)                                                                                          
               at org.apache.flink.runtime.rpc.akka.AkkaRpcActor.handleRpcMessage(AkkaRpcActor.java:162)
               at org.apache.flink.runtime.rpc.akka.FencedAkkaRpcActor.handleRpcMessage(FencedAkkaRpcActor.java:70)
               at org.apache.flink.runtime.rpc.akka.AkkaRpcActor.onReceive(AkkaRpcActor.java:142)                                                                                                    
               at org.apache.flink.runtime.rpc.akka.FencedAkkaRpcActor.onReceive(FencedAkkaRpcActor.java:40)                                                                                         
               at akka.actor.UntypedActor$$anonfun$receive$1.applyOrElse(UntypedActor.scala:165)                                                                                                     
               at akka.actor.Actor$class.aroundReceive(Actor.scala:502)                                                                                                                              
               at akka.actor.UntypedActor.aroundReceive(UntypedActor.scala:95)                                                                                                                       
               at akka.actor.ActorCell.receiveMessage(ActorCell.scala:526)                   
               at akka.actor.ActorCell.invoke(ActorCell.scala:495)             
               at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:257)                                                                                                                            
               at akka.dispatch.Mailbox.run(Mailbox.scala:224)    
               at akka.dispatch.Mailbox.exec(Mailbox.scala:234)                                                                                                                                      
               at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)                                                                                                               
               at scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)                                                                                                   
               at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)       
               at scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
      

      Related: The job details don't appear, the screen shows only the skeleton, but no information (like the pipeline, substasks, etc).

      One thing that may have caused this is that the job was failing – an uncaught exception on our code – and, during one of its restarts, I issued a "flink cancel <jobid>". The job was cancelled, but the JobManager interface took a very long time to put the slots as available again.

      Attachments

        Issue Links

          Activity

            People

              azagrebin Andrey Zagrebin
              JBiason Julio Biason
              Votes:
              0 Vote for this issue
              Watchers:
              8 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: