Uploaded image for project: 'Flink'
  1. Flink
  2. FLINK-16018

Improve error reporting when submitting batch job (instead of AskTimeoutException)

    XMLWordPrintableJSON

Details

    Description

      While debugging the Shaded Hadoop S3A end-to-end test (minio) pre-commit test, I noticed that the JobSubmission is not producing very helpful error messages.

      Environment:

      • A simple batch wordcount job
      • a unavailable minio s3 filesystem service

      What happens from a user's perspective:

      • The job submission fails after 10 seconds with a AskTimeoutException:
        2020-02-07T11:38:27.1189393Z akka.pattern.AskTimeoutException: Ask timed out on [Actor[akka://flink/user/dispatcher#-939201095]] after [10000 ms]. Message of type [org.apache.flink.runtime.rpc.messages.LocalFencedMessage]. A typical reason for `AskTimeoutException` is that the recipient actor didn't send a reply.
        2020-02-07T11:38:27.1189538Z 	at akka.pattern.PromiseActorRef$$anonfun$2.apply(AskSupport.scala:635)
        2020-02-07T11:38:27.1189616Z 	at akka.pattern.PromiseActorRef$$anonfun$2.apply(AskSupport.scala:635)
        2020-02-07T11:38:27.1189713Z 	at akka.pattern.PromiseActorRef$$anonfun$1.apply$mcV$sp(AskSupport.scala:648)
        2020-02-07T11:38:27.1189789Z 	at akka.actor.Scheduler$$anon$4.run(Scheduler.scala:205)
        2020-02-07T11:38:27.1189883Z 	at scala.concurrent.Future$InternalCallbackExecutor$.unbatchedExecute(Future.scala:601)
        2020-02-07T11:38:27.1189973Z 	at scala.concurrent.BatchingExecutor$class.execute(BatchingExecutor.scala:109)
        2020-02-07T11:38:27.1190067Z 	at scala.concurrent.Future$InternalCallbackExecutor$.execute(Future.scala:599)
        2020-02-07T11:38:27.1190159Z 	at akka.actor.LightArrayRevolverScheduler$TaskHolder.executeTask(LightArrayRevolverScheduler.scala:328)
        2020-02-07T11:38:27.1190267Z 	at akka.actor.LightArrayRevolverScheduler$$anon$4.executeBucket$1(LightArrayRevolverScheduler.scala:279)
        2020-02-07T11:38:27.1190358Z 	at akka.actor.LightArrayRevolverScheduler$$anon$4.nextTick(LightArrayRevolverScheduler.scala:283)
        2020-02-07T11:38:27.1190465Z 	at akka.actor.LightArrayRevolverScheduler$$anon$4.run(LightArrayRevolverScheduler.scala:235)
        2020-02-07T11:38:27.1190540Z 	at java.lang.Thread.run(Thread.java:748)
        

      What a user would expect:

      • An error message indicating why the job submission failed.

      Attachments

        Issue Links

          Activity

            People

              trohrmann Till Rohrmann
              rmetzger Robert Metzger
              Votes:
              1 Vote for this issue
              Watchers:
              11 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Time Tracking

                  Estimated:
                  Original Estimate - Not Specified
                  Not Specified
                  Remaining:
                  Remaining Estimate - 0h
                  0h
                  Logged:
                  Time Spent - 20m
                  20m