[SPARK-13112] CoarsedExecutorBackend register to driver should wait Executor was ready - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Resolved
Priority: Major
Resolution: Fixed
Affects Version/s: 1.6.0
Fix Version/s: 2.0.0
Component/s: Spark Core
Labels:
None

Target Version/s:

2.0.0

Description

desc：
due to some host's disk are busy, it will results failed in timeoutException while executor try to register to shuffler server on that host...
and then it will exit(1) while launch task on a null executor.

and yarn cluster resource are a little busy, yarn will thought that host is idle, it will prefer to allocate the same host executor, so it will have a chance that one task failed 4 times in the same host.

currently, CoarsedExecutorBackend register to driver first, and after registerDriver successful, then initial Executor.
if exception occurs in Executor initialization,
But Driver don't know that event, will still launch task in that executor,
then will call system.exit(1).

 override def receive: PartialFunction[Any, Unit] = { 
  case RegisteredExecutor(hostname) => 
  logInfo("Successfully registered with driver") executor = new Executor(executorId, hostname, env, userClassPath, isLocal = false) 
......
case LaunchTask(data) =>
   if (executor == null) {
    logError("Received LaunchTask command but executor was null")        System.exit(1)

It is more reasonable to register with driver after Executor is ready... and make registerTimeout to be configurable...

Attachments

Issue Links

is duplicated by

SPARK-18820 Driver may send "LaunchTask" before executor receive "RegisteredExecutor"

Resolved

SPARK-16230 Executors self-killing after being assigned tasks while still in init

Resolved

SPARK-13060 CoarsedExecutorBackend register to driver should wait Executor was ready?

Resolved

links to

[Github] Pull Request #12078 (viper-kun)

[Github] Pull Request #12211 (zsxwing)

Activity

People

Assignee:: Shixiong Zhu

Reporter:: SuYan

Votes:: 0 Vote for this issue

Watchers:: 6 Start watching this issue

Dates

Created:: 01/Feb/16 12:29

Updated:: 20/Dec/16 19:13

Resolved:: 06/Apr/16 23:18