Details
-
Improvement
-
Status: Resolved
-
Blocker
-
Resolution: Fixed
-
1.9.2, 1.10.0
Description
While debugging the Shaded Hadoop S3A end-to-end test (minio) pre-commit test, I noticed that the JobSubmission is not producing very helpful error messages.
Environment:
- A simple batch wordcount job
- a unavailable minio s3 filesystem service
What happens from a user's perspective:
- The job submission fails after 10 seconds with a AskTimeoutException:
2020-02-07T11:38:27.1189393Z akka.pattern.AskTimeoutException: Ask timed out on [Actor[akka://flink/user/dispatcher#-939201095]] after [10000 ms]. Message of type [org.apache.flink.runtime.rpc.messages.LocalFencedMessage]. A typical reason for `AskTimeoutException` is that the recipient actor didn't send a reply. 2020-02-07T11:38:27.1189538Z at akka.pattern.PromiseActorRef$$anonfun$2.apply(AskSupport.scala:635) 2020-02-07T11:38:27.1189616Z at akka.pattern.PromiseActorRef$$anonfun$2.apply(AskSupport.scala:635) 2020-02-07T11:38:27.1189713Z at akka.pattern.PromiseActorRef$$anonfun$1.apply$mcV$sp(AskSupport.scala:648) 2020-02-07T11:38:27.1189789Z at akka.actor.Scheduler$$anon$4.run(Scheduler.scala:205) 2020-02-07T11:38:27.1189883Z at scala.concurrent.Future$InternalCallbackExecutor$.unbatchedExecute(Future.scala:601) 2020-02-07T11:38:27.1189973Z at scala.concurrent.BatchingExecutor$class.execute(BatchingExecutor.scala:109) 2020-02-07T11:38:27.1190067Z at scala.concurrent.Future$InternalCallbackExecutor$.execute(Future.scala:599) 2020-02-07T11:38:27.1190159Z at akka.actor.LightArrayRevolverScheduler$TaskHolder.executeTask(LightArrayRevolverScheduler.scala:328) 2020-02-07T11:38:27.1190267Z at akka.actor.LightArrayRevolverScheduler$$anon$4.executeBucket$1(LightArrayRevolverScheduler.scala:279) 2020-02-07T11:38:27.1190358Z at akka.actor.LightArrayRevolverScheduler$$anon$4.nextTick(LightArrayRevolverScheduler.scala:283) 2020-02-07T11:38:27.1190465Z at akka.actor.LightArrayRevolverScheduler$$anon$4.run(LightArrayRevolverScheduler.scala:235) 2020-02-07T11:38:27.1190540Z at java.lang.Thread.run(Thread.java:748)
What a user would expect:
- An error message indicating why the job submission failed.
Attachments
Issue Links
- is duplicated by
-
FLINK-11143 AskTimeoutException is thrown during job submission and completion
- Closed
- relates to
-
FLINK-16429 failed to restore flink job from checkpoints due to unhandled exceptions
- Closed
-
FLINK-16867 Simplify default timeout configuration
- Open
-
FLINK-16866 Make job submission non-blocking
- Closed
- links to