Uploaded image for project: 'Flink'
  1. Flink
  2. FLINK-22688

Root Exception can not be shown on Web UI in Flink 1.13.0

    XMLWordPrintableJSON

Details

    Description

      Hi,
       
      We have upgraded our Flink applications to 1.13.0 but we found that Root Exception can not be shown on Web UI with an internal server error message. After opening a browser development console and trace the message, we found that there is an exception in job manager:
       
      2021-05-12 13:30:45,589 ERROR org.apache.flink.runtime.rest.handler.job.JobExceptionsHandler [] - Unhandled exception.
      java.lang.IllegalArgumentException: The location must not be null for a non-global failure.
          at org.apache.flink.util.Preconditions.checkArgument(Preconditions.java:138) ~[flink-dist_2.12-1.13.0.jar:1.13.0]
          at org.apache.flink.runtime.rest.handler.job.JobExceptionsHandler.assertLocalExceptionInfo(JobExceptionsHandler.java:218) ~[flink-dist_2.12-1.13.0.jar:1.13.0]
          at org.apache.flink.runtime.rest.handler.job.JobExceptionsHandler.createRootExceptionInfo(JobExceptionsHandler.java:191) ~[flink-dist_2.12-1.13.0.jar:1.13.0]
          at java.util.stream.ReferencePipeline$3$1.accept(ReferencePipeline.java:195) ~[?:?]
          at java.util.stream.SliceOps$1$1.accept(SliceOps.java:199) ~[?:?]
          at java.util.ArrayList$ArrayListSpliterator.tryAdvance(ArrayList.java:1632) ~[?:?]
          at java.util.stream.ReferencePipeline.forEachWithCancel(ReferencePipeline.java:127) ~[?:?]
          at java.util.stream.AbstractPipeline.copyIntoWithCancel(AbstractPipeline.java:502) ~[?:?]
          at java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:488) ~[?:?]
          at java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:474) ~[?:?]
          at java.util.stream.ReduceOps$ReduceOp.evaluateSequential(ReduceOps.java:913) ~[?:?]
          at java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234) ~[?:?]
          at java.util.stream.ReferencePipeline.collect(ReferencePipeline.java:578) ~[?:?]
          at org.apache.flink.runtime.rest.handler.job.JobExceptionsHandler.createJobExceptionHistory(JobExceptionsHandler.java:169) ~[flink-dist_2.12-1.13.0.jar:1.13.0]
          at org.apache.flink.runtime.rest.handler.job.JobExceptionsHandler.createJobExceptionsInfo(JobExceptionsHandler.java:154) ~[flink-dist_2.12-1.13.0.jar:1.13.0]
          at org.apache.flink.runtime.rest.handler.job.JobExceptionsHandler.handleRequest(JobExceptionsHandler.java:101) ~[flink-dist_2.12-1.13.0.jar:1.13.0]
          at org.apache.flink.runtime.rest.handler.job.JobExceptionsHandler.handleRequest(JobExceptionsHandler.java:63) ~[flink-dist_2.12-1.13.0.jar:1.13.0]
          at org.apache.flink.runtime.rest.handler.job.AbstractExecutionGraphHandler.lambda$handleRequest$0(AbstractExecutionGraphHandler.java:87) ~[flink-dist_2.12-1.13.0.jar:1.13.0]
          at java.util.concurrent.CompletableFuture$UniApply.tryFire(CompletableFuture.java:642) [?:?]
          at java.util.concurrent.CompletableFuture$Completion.run(CompletableFuture.java:478) [?:?]
          at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515) [?:?]
          at java.util.concurrent.FutureTask.run(FutureTask.java:264) [?:?]
          at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:304) [?:?]
          at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) [?:?]
          at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) [?:?]
          at java.lang.Thread.run(Thread.java:834) [?:?]
       
      I see there are some exceptions in task managers and I remember the kind of exception can be shown in UI in version 1.12.1 :
       
      2021-05-18 00:50:30,261 WARN org.apache.flink.runtime.taskmanager.Task [] - xxx (23/90)#13 (c345fb009b5d93628b5a6d890c8f4226) switched from RUNNING to FAILED with failure cause: org.apache.flink.runtime.io.network.netty.exception.RemoteTransportException: Connection unexpectedly closed by remote task manager '10.194.65.3/10.194.65.3:44273'. This might indicate that the remote task manager was lost.
          at org.apache.flink.runtime.io.network.netty.CreditBasedPartitionRequestClientHandler.channelInactive(CreditBasedPartitionRequestClientHandler.java:160)
          at org.apache.flink.shaded.netty4.io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:262)
          at org.apache.flink.shaded.netty4.io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:248)
          at org.apache.flink.shaded.netty4.io.netty.channel.AbstractChannelHandlerContext.fireChannelInactive(AbstractChannelHandlerContext.java:241)
          at org.apache.flink.shaded.netty4.io.netty.channel.ChannelInboundHandlerAdapter.channelInactive(ChannelInboundHandlerAdapter.java:81)
          at org.apache.flink.runtime.io.network.netty.NettyMessageClientDecoderDelegate.channelInactive(NettyMessageClientDecoderDelegate.java:94)
          at org.apache.flink.shaded.netty4.io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:262)
          at org.apache.flink.shaded.netty4.io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:248)
          at org.apache.flink.shaded.netty4.io.netty.channel.AbstractChannelHandlerContext.fireChannelInactive(AbstractChannelHandlerContext.java:241)
          at org.apache.flink.shaded.netty4.io.netty.channel.DefaultChannelPipeline$HeadContext.channelInactive(DefaultChannelPipeline.java:1405)
          at org.apache.flink.shaded.netty4.io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:262)
          at org.apache.flink.shaded.netty4.io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:248)
          at org.apache.flink.shaded.netty4.io.netty.channel.DefaultChannelPipeline.fireChannelInactive(DefaultChannelPipeline.java:901)
          at org.apache.flink.shaded.netty4.io.netty.channel.AbstractChannel$AbstractUnsafe$8.run(AbstractChannel.java:818)
          at org.apache.flink.shaded.netty4.io.netty.util.concurrent.AbstractEventExecutor.safeExecute(AbstractEventExecutor.java:164)
          at org.apache.flink.shaded.netty4.io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:472)
          at org.apache.flink.shaded.netty4.io.netty.channel.epoll.EpollEventLoop.run(EpollEventLoop.java:384)
          at org.apache.flink.shaded.netty4.io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:989)
          at org.apache.flink.shaded.netty4.io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74)
          at java.base/java.lang.Thread.run(Thread.java:834)
       
       
       
      The issue has been reported in flink-user mailing list before: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Root-Exception-can-not-be-shown-on-Web-UI-in-Flink-1-13-0-td43673.html

      Attachments

        1. jobmanager_log_v1.txt.zip
          516 kB
          Gary Wu
        2. taskmanager_log_v1.txt
          898 kB
          Gary Wu

        Issue Links

          Activity

            People

              mapohl Matthias Pohl
              gary.wu Gary Wu
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: