Details
-
Bug
-
Status: Resolved
-
Major
-
Resolution: Fixed
-
None
-
None
Description
I somehow saw a failed test org.apache.spark.DistributedSuite.caching in memory, serialized, replicated. Its log shows that Spark master asked the worker to launch an executor before the worker actually got the response of registration. So, the master knew that the worker had been registered. But, the worker did not know if it self had been registered.
16/09/30 14:53:53.681 dispatcher-event-loop-0 INFO Master: Registering worker localhost:38262 with 1 cores, 1024.0 MB RAM 16/09/30 14:53:53.681 dispatcher-event-loop-0 INFO Master: Launching executor app-20160930145353-0000/1 on worker worker-20160930145353-localhost-38262 16/09/30 14:53:53.682 dispatcher-event-loop-3 INFO StandaloneAppClient$ClientEndpoint: Executor added: app-20160930145353-0000/1 on worker-20160930145353-localhost-38262 (localhost:38262) with 1 cores 16/09/30 14:53:53.683 dispatcher-event-loop-3 INFO StandaloneSchedulerBackend: Granted executor ID app-20160930145353-0000/1 on hostPort localhost:38262 with 1 cores, 1024.0 MB RAM 16/09/30 14:53:53.683 dispatcher-event-loop-0 WARN Worker: Invalid Master (spark://localhost:46460) attempted to launch executor. 16/09/30 14:53:53.687 worker-register-master-threadpool-0 INFO Worker: Successfully registered with master spark://localhost:46460
Then, seems the worker did not launch any executor.
Attachments
Issue Links
- incorporates
-
SPARK-10651 Flaky test: BroadcastSuite
-
- Resolved
-
- links to