Uploaded image for project: 'Bigtop'
  1. Bigtop
  2. BIGTOP-2490

Spark in HA when Zookeeper is available

    Details

    • Type: Improvement
    • Status: Closed
    • Priority: Minor
    • Resolution: Fixed
    • Affects Version/s: 1.1.0
    • Fix Version/s: 1.2.0
    • Component/s: spark
    • Labels:

      Description

      Spark could work in HA mode as long as a Zookeeper deployment is available. In this improvement we add an option to the Spark configuration through which we can set the Zookeeper ensemble string. This string is pushed down to Spark configs causing the Spark to function in HA mode.

      In HA mode Zookeeper is used to select the Spark leader and the workers need to ping all the potential leaders to pinpoint the current one.

      You can monitor the progress of this work in: https://github.com/juju-solutions/bigtop/tree/BIGTOP-2490

        Issue Links

          Activity

          Hide
          githubbot ASF GitHub Bot added a comment -

          GitHub user ktsakalozos opened a pull request:

          https://github.com/apache/bigtop/pull/139

          BIGTOP-2490: Spark in HA when Zookeeper is available

          You can merge this pull request into a Git repository by running:

          $ git pull https://github.com/juju-solutions/bigtop BIGTOP-2490

          Alternatively you can review and apply these changes as the patch at:

          https://github.com/apache/bigtop/pull/139.patch

          To close this pull request, make a commit to your master/trunk branch
          with (at least) the following in the commit message:

          This closes #139


          commit 404d45ca7974f093cc3bfc5dde17c33483c886a5
          Author: Konstantinos Tsakalozos <konstantinos.tsakalozos@canonical.com>
          Date: 2016-07-05T17:25:27Z

          BIGTOP-2490: Spark in HA when Zookeeper is available


          Show
          githubbot ASF GitHub Bot added a comment - GitHub user ktsakalozos opened a pull request: https://github.com/apache/bigtop/pull/139 BIGTOP-2490 : Spark in HA when Zookeeper is available You can merge this pull request into a Git repository by running: $ git pull https://github.com/juju-solutions/bigtop BIGTOP-2490 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/bigtop/pull/139.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #139 commit 404d45ca7974f093cc3bfc5dde17c33483c886a5 Author: Konstantinos Tsakalozos <konstantinos.tsakalozos@canonical.com> Date: 2016-07-05T17:25:27Z BIGTOP-2490 : Spark in HA when Zookeeper is available
          Hide
          ktsakalozos Konstantinos Tsakalozos added a comment -

          In order to create a Spark HA the spark-worker service should start with the master(s) url as a parameter (that looks like "spark://master1:7077,master2:7077.....").
          With this patch we change the current behavior of the spark-worker startup script to read the SPARK_MASTER_URL from spark-env.sh . The puppet scripts have been updated to set the SPARK_MASTER_URL. Also, the puppet scripts make sure the SPARK_DAEMON_JAVA_OPTS reflect the intent to build a HA cluster if a Zookeeper connection string is provided.

          Show
          ktsakalozos Konstantinos Tsakalozos added a comment - In order to create a Spark HA the spark-worker service should start with the master(s) url as a parameter (that looks like "spark://master1:7077,master2:7077....."). With this patch we change the current behavior of the spark-worker startup script to read the SPARK_MASTER_URL from spark-env.sh . The puppet scripts have been updated to set the SPARK_MASTER_URL. Also, the puppet scripts make sure the SPARK_DAEMON_JAVA_OPTS reflect the intent to build a HA cluster if a Zookeeper connection string is provided.
          Hide
          evans_ye Evans Ye added a comment -

          I'm wondering how zookeeper_connection_string being provided?
          It looks like the only appearance of that variable is under kafka:

          hieradata/bigtop/cluster.yaml:kafka::server::zookeeper_connection_string: "%{hiera('bigtop::hadoop_head_node')}:2181"
          

          It doesn't make sense to me so far. Can you elaborate your thought here?

          Show
          evans_ye Evans Ye added a comment - I'm wondering how zookeeper_connection_string being provided? It looks like the only appearance of that variable is under kafka: hieradata/bigtop/cluster.yaml:kafka::server::zookeeper_connection_string: "%{hiera('bigtop::hadoop_head_node')}:2181" It doesn't make sense to me so far. Can you elaborate your thought here?
          Hide
          ktsakalozos Konstantinos Tsakalozos added a comment -

          To provide the zookeeper connection string used for Spark you need to so like this:

          spark::common::zookeeper_connection_string: "<the ZK connection string>"

          This connection string (spark::common::zookeeper_connection_string) has nothing to do with the connection string used by Kafka, although it is up to you to use the same ZK instances to host both Spark and Kafka metadata. Should I update that variable name so that there is no confusion?

          By default Spark will not be deployed in HA mode. However, as soon as you provide a connection string to ZK you are essentially requesting a Spark HA deployment.

          Show
          ktsakalozos Konstantinos Tsakalozos added a comment - To provide the zookeeper connection string used for Spark you need to so like this: spark::common::zookeeper_connection_string: "<the ZK connection string>" This connection string (spark::common::zookeeper_connection_string) has nothing to do with the connection string used by Kafka, although it is up to you to use the same ZK instances to host both Spark and Kafka metadata. Should I update that variable name so that there is no confusion? By default Spark will not be deployed in HA mode. However, as soon as you provide a connection string to ZK you are essentially requesting a Spark HA deployment.
          Hide
          evans_ye Evans Ye added a comment -

          We supposedly to put the configuration key spark::common::zookeeper_connection_string in hieradata/bigtop/cluster.yaml so that people know what to configure. If we'd like to set the default deploy mode to non-HA mode, then specify the default value to empty in hieradata/bigtop/cluster.yaml. Do you think this make sense?

          Show
          evans_ye Evans Ye added a comment - We supposedly to put the configuration key spark::common::zookeeper_connection_string in hieradata/bigtop/cluster.yaml so that people know what to configure. If we'd like to set the default deploy mode to non-HA mode, then specify the default value to empty in hieradata/bigtop/cluster.yaml . Do you think this make sense?
          Hide
          ktsakalozos Konstantinos Tsakalozos added a comment -

          Got some time to update the PR with the changes suggested on the cluster.yaml.
          Evans Ye thank you for taking the time to take a look at this patch.

          Show
          ktsakalozos Konstantinos Tsakalozos added a comment - Got some time to update the PR with the changes suggested on the cluster.yaml. Evans Ye thank you for taking the time to take a look at this patch.
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user asfgit closed the pull request at:

          https://github.com/apache/bigtop/pull/139

          Show
          githubbot ASF GitHub Bot added a comment - Github user asfgit closed the pull request at: https://github.com/apache/bigtop/pull/139
          Hide
          kwmonroe Kevin W Monroe added a comment -

          I've tested this with and without zookeeper, in standalone and yarn-client modes. Looks good. Thanks Konstantinos Tsakalozos!

          Show
          kwmonroe Kevin W Monroe added a comment - I've tested this with and without zookeeper, in standalone and yarn-client modes. Looks good. Thanks Konstantinos Tsakalozos !
          Hide
          evans_ye Evans Ye added a comment -

          Thanks for testing it kevin.
          Patch looks good to me as well.
          Thank you for the patch.

          Show
          evans_ye Evans Ye added a comment - Thanks for testing it kevin. Patch looks good to me as well. Thank you for the patch.

            People

            • Assignee:
              ktsakalozos Konstantinos Tsakalozos
              Reporter:
              ktsakalozos Konstantinos Tsakalozos
            • Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development