Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-12267

Standalone master keeps references to disassociated workers until they sent no heartbeats

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • 1.6.0
    • 1.6.0
    • Spark Core
    • None

    Description

      While toying with Spark Standalone I've noticed the following messages
      in the logs of the master:

      INFO Master: Registering worker 192.168.1.6:59919 with 2 cores, 2.0 GB RAM
      INFO Master: localhost:59920 got disassociated, removing it.
      ...
      WARN Master: Removing worker-20151210090708-192.168.1.6-59919 because
      we got no heartbeat in 60 seconds
      INFO Master: Removing worker worker-20151210090708-192.168.1.6-59919
      on 192.168.1.6:59919
      

      Why does the message "WARN Master: Removing
      worker-20151210090708-192.168.1.6-59919 because we got no heartbeat in
      60 seconds" appear when the worker should've been removed already (as
      pointed out in "INFO Master: localhost:59920 got disassociated,
      removing it.")?

      Could it be that the ids are different - 192.168.1.6:59919 vs localhost:59920?

      I started master using ./sbin/start-master.sh -h localhost and the
      workers ./sbin/start-slave.sh spark://localhost:7077.

      Attachments

        Activity

          People

            zsxwing Shixiong Zhu
            jlaskowski Jacek Laskowski
            Votes:
            0 Vote for this issue
            Watchers:
            7 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: