Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-15544

Bouncing Zookeeper node causes Active spark master to exit

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Incomplete
    • 1.6.1
    • None
    • Spark Core
    • Ubuntu 14.04. Zookeeper 3.4.6 with 3-node quorum

    Description

      Shutting Down a single zookeeper node caused spark master to exit. The master should have connected to a second zookeeper node.

      log output
      16/05/25 18:21:28 INFO master.Master: Launching executor app-20160525182128-0006/1 on worker worker-20160524013212-10.16.28.76-59138
      16/05/25 18:21:28 INFO master.Master: Launching executor app-20160525182128-0006/2 on worker worker-20160524013204-10.16.21.217-47129
      16/05/26 00:16:01 INFO zookeeper.ClientCnxn: Unable to read additional data from server sessionid 0x154dfc0426b0054, likely server has closed socket, closing socket connection and attempting reconnect
      16/05/26 00:16:01 INFO zookeeper.ClientCnxn: Unable to read additional data from server sessionid 0x254c701f28d0053, likely server has closed socket, closing socket connection and attempting reconnect
      16/05/26 00:16:01 INFO state.ConnectionStateManager: State change: SUSPENDED
      16/05/26 00:16:01 INFO state.ConnectionStateManager: State change: SUSPENDED
      16/05/26 00:16:01 INFO master.ZooKeeperLeaderElectionAgent: We have lost leadership
      16/05/26 00:16:01 ERROR master.Master: Leadership has been revoked -- master shutting down. }}
      

      spark-env.sh:

      spark-env.sh
      export SPARK_LOCAL_DIRS=/ephemeral/spark/local
      export SPARK_WORKER_DIR=/ephemeral/spark/work
      export SPARK_LOG_DIR=/var/log/spark
      export HADOOP_CONF_DIR=/home/ubuntu/hadoop-2.6.3/etc/hadoop
      
      export SPARK_DAEMON_JAVA_OPTS="-Dspark.deploy.recoveryMode=ZOOKEEPER -Dspark.deploy.zookeeper.url=gn5456-zookeeper-01:2181,gn5456-zookeeper-02:2181,gn5456-zookeeper-03:2181"
      
      export SPARK_WORKER_OPTS="-Dspark.worker.cleanup.enabled=true"
      

      Attachments

        Issue Links

          Activity

            People

              Unassigned Unassigned
              slowenthal Steven Lowenthal
              Votes:
              7 Vote for this issue
              Watchers:
              15 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: