Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-12267

Standalone master keeps references to disassociated workers until they sent no heartbeats

    XMLWordPrintableJSON

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: 1.6.0
    • Fix Version/s: 1.6.0
    • Component/s: Spark Core
    • Labels:
      None

      Description

      While toying with Spark Standalone I've noticed the following messages
      in the logs of the master:

      INFO Master: Registering worker 192.168.1.6:59919 with 2 cores, 2.0 GB RAM
      INFO Master: localhost:59920 got disassociated, removing it.
      ...
      WARN Master: Removing worker-20151210090708-192.168.1.6-59919 because
      we got no heartbeat in 60 seconds
      INFO Master: Removing worker worker-20151210090708-192.168.1.6-59919
      on 192.168.1.6:59919
      

      Why does the message "WARN Master: Removing
      worker-20151210090708-192.168.1.6-59919 because we got no heartbeat in
      60 seconds" appear when the worker should've been removed already (as
      pointed out in "INFO Master: localhost:59920 got disassociated,
      removing it.")?

      Could it be that the ids are different - 192.168.1.6:59919 vs localhost:59920?

      I started master using ./sbin/start-master.sh -h localhost and the
      workers ./sbin/start-slave.sh spark://localhost:7077.

        Attachments

          Activity

            People

            • Assignee:
              zsxwing Shixiong Zhu
              Reporter:
              jlaskowski Jacek Laskowski
            • Votes:
              0 Vote for this issue
              Watchers:
              7 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: