Uploaded image for project: 'Flink'
  1. Flink
  2. FLINK-25904

NullArgumentException when accessing checkpoint stats on standby JobManager

    XMLWordPrintableJSON

Details

    Description

      We have a job running on one node
      after increasing number of nodes to e.g. 3 on a new nodes job starts failing with

      ERROR Unhandled exception. (org.apache.flink.runtime.rest.handler.job.checkpoints.CheckpointingStatisticsHandler:260)
       org.apache.commons.math3.exception.NullArgumentException: input array
               at org.apache.commons.math3.util.MathArrays.verifyValues(MathArrays.java:1650) ~[flink-dist_2.12-1.14.3.jar:1.14.3]
               at org.apache.commons.math3.stat.descriptive.AbstractUnivariateStatistic.test(AbstractUnivariateStatistic.java:158) ~[flink-dist_2.12-1.14.3.jar:1.14.3]
               at org.apache.commons.math3.stat.descriptive.rank.Percentile.evaluate(Percentile.java:272) ~[flink-dist_2.12-1.14.3.jar:1.14.3]
               at org.apache.commons.math3.stat.descriptive.rank.Percentile.evaluate(Percentile.java:241) ~[flink-dist_2.12-1.14.3.jar:1.14.3]
               at org.apache.flink.runtime.metrics.DescriptiveStatisticsHistogramStatistics$CommonMetricsSnapshot.getPercentile(DescriptiveStatisticsHistogramStatistics.java:158) >
               at org.apache.flink.runtime.metrics.DescriptiveStatisticsHistogramStatistics.getQuantile(DescriptiveStatisticsHistogramStatistics.java:52) ~[flink-dist_2.12-1.14.3.>
               at org.apache.flink.runtime.checkpoint.StatsSummarySnapshot.getQuantile(StatsSummarySnapshot.java:108) ~[flink-dist_2.12-1.14.3.jar:1.14.3]
               at org.apache.flink.runtime.rest.messages.checkpoints.StatsSummaryDto.valueOf(StatsSummaryDto.java:81) ~[flink-dist_2.12-1.14.3.jar:1.14.3]
               at org.apache.flink.runtime.rest.handler.job.checkpoints.CheckpointingStatisticsHandler.createCheckpointingStatistics(CheckpointingStatisticsHandler.java:129) ~[fli>
               at org.apache.flink.runtime.rest.handler.job.checkpoints.CheckpointingStatisticsHandler.handleRequest(CheckpointingStatisticsHandler.java:84) ~[flink-dist_2.12-1.14>
               at org.apache.flink.runtime.rest.handler.job.checkpoints.CheckpointingStatisticsHandler.handleRequest(CheckpointingStatisticsHandler.java:58) ~[flink-dist_2.12-1.14>
               at org.apache.flink.runtime.rest.handler.job.AbstractAccessExecutionGraphHandler.handleRequest(AbstractAccessExecutionGraphHandler.java:68) ~[flink-dist_2.12-1.14.3>
               at org.apache.flink.runtime.rest.handler.job.AbstractExecutionGraphHandler.lambda$handleRequest$0(AbstractExecutionGraphHandler.java:87) ~[flink-dist_2.12-1.14.3.ja>
               at java.util.concurrent.CompletableFuture$UniApply.tryFire(CompletableFuture.java:642) [?:?]
               at java.util.concurrent.CompletableFuture$Completion.run(CompletableFuture.java:478) [?:?]
               at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515) [?:?]
               at java.util.concurrent.FutureTask.run(FutureTask.java:264) [?:?]
               at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:304) [?:?]
               at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) [?:?]
               at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) [?:?]
               at java.lang.Thread.run(Thread.java:829) [?:?]
      

      Attachments

        Issue Links

          Activity

            People

              chesnay Chesnay Schepler
              Sergey Nuyanzin Sergey Nuyanzin
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: