Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-17755

Master may ask a worker to launch an executor before the worker actually got the response of registration

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • None
    • 2.2.0
    • Spark Core
    • None

    Description

      I somehow saw a failed test org.apache.spark.DistributedSuite.caching in memory, serialized, replicated. Its log shows that Spark master asked the worker to launch an executor before the worker actually got the response of registration. So, the master knew that the worker had been registered. But, the worker did not know if it self had been registered.

      16/09/30 14:53:53.681 dispatcher-event-loop-0 INFO Master: Registering worker localhost:38262 with 1 cores, 1024.0 MB RAM
      16/09/30 14:53:53.681 dispatcher-event-loop-0 INFO Master: Launching executor app-20160930145353-0000/1 on worker worker-20160930145353-localhost-38262
      16/09/30 14:53:53.682 dispatcher-event-loop-3 INFO StandaloneAppClient$ClientEndpoint: Executor added: app-20160930145353-0000/1 on worker-20160930145353-localhost-38262 (localhost:38262) with 1 cores
      16/09/30 14:53:53.683 dispatcher-event-loop-3 INFO StandaloneSchedulerBackend: Granted executor ID app-20160930145353-0000/1 on hostPort localhost:38262 with 1 cores, 1024.0 MB RAM
      16/09/30 14:53:53.683 dispatcher-event-loop-0 WARN Worker: Invalid Master (spark://localhost:46460) attempted to launch executor.
      16/09/30 14:53:53.687 worker-register-master-threadpool-0 INFO Worker: Successfully registered with master spark://localhost:46460
      

      Then, seems the worker did not launch any executor.

      Attachments

        Issue Links

          Activity

            People

              zsxwing Shixiong Zhu
              yhuai Yin Huai
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: