Uploaded image for project: 'Bigtop'
  1. Bigtop
  2. BIGTOP-2766

[Puppet] Spark worker startup failed due to default master_url is yarn

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: 1.2.0
    • Fix Version/s: 1.2.1
    • Component/s: deployment, spark
    • Labels:
      None

      Description

      When one simply specify spark in cluster_components, the spark standalone deployment is chosen. However, the master url is default set to yarn, lead to spark worker startup failure.

        Issue Links

          Activity

          Hide
          evans_ye Evans Ye added a comment -

          Committed to the master.

          Show
          evans_ye Evans Ye added a comment - Committed to the master.
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user evans-ye commented on the issue:

          https://github.com/apache/bigtop/pull/208

          Seems no objection for two days. I need to get thing rolling to fix spark deployment in provisioner. So I'll commit now. Thanks for your high quality review @kwmonroe.

          Show
          githubbot ASF GitHub Bot added a comment - Github user evans-ye commented on the issue: https://github.com/apache/bigtop/pull/208 Seems no objection for two days. I need to get thing rolling to fix spark deployment in provisioner. So I'll commit now. Thanks for your high quality review @kwmonroe.
          Hide
          evans_ye Evans Ye added a comment -

          Thanks Kevin W Monroe. Patch updated.

          Show
          evans_ye Evans Ye added a comment - Thanks Kevin W Monroe . Patch updated.
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user evans-ye commented on a diff in the pull request:

          https://github.com/apache/bigtop/pull/208#discussion_r116349526

          — Diff: bigtop-deploy/puppet/modules/spark/templates/spark-defaults.conf —
          @@ -13,7 +13,15 @@

          1. See the License for the specific language governing permissions and
          2. limitations under the License.

          +<% if @master_url -%>
          spark.master <%= @master_url %>
          +<% else -%>
          +<% if scope['deploy::roles'].include? 'spark-standalone' -%>
          — End diff –

          Yep, you're right. I'm also confused as well
          +1 to your suggestion. That way we still keep the default to yarn strategy consistent down to config level.

          Show
          githubbot ASF GitHub Bot added a comment - Github user evans-ye commented on a diff in the pull request: https://github.com/apache/bigtop/pull/208#discussion_r116349526 — Diff: bigtop-deploy/puppet/modules/spark/templates/spark-defaults.conf — @@ -13,7 +13,15 @@ See the License for the specific language governing permissions and limitations under the License. +<% if @master_url -%> spark.master <%= @master_url %> +<% else -%> +<% if scope ['deploy::roles'] .include? 'spark-standalone' -%> — End diff – Yep, you're right. I'm also confused as well +1 to your suggestion. That way we still keep the default to yarn strategy consistent down to config level.
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user kwmonroe commented on a diff in the pull request:

          https://github.com/apache/bigtop/pull/208#discussion_r116296932

          — Diff: bigtop-deploy/puppet/modules/spark/templates/spark-defaults.conf —
          @@ -13,7 +13,15 @@

          1. See the License for the specific language governing permissions and
          2. limitations under the License.

          +<% if @master_url -%>
          spark.master <%= @master_url %>
          +<% else -%>
          +<% if scope['deploy::roles'].include? 'spark-standalone' -%>
          — End diff –

          @evans-ye sorry i didn't catch this earlier (i still get confused with roles vs components sometimes). However, I think this condition isn't quite right. The *component* is called `spark-standalone`, while the *role* that should be handled here is `spark-master` or `spark-worker`.

          I checked this a few different ways...

          • Using components on a single head node:
          • `['deploy::roles']` is `[spark-master, spark-worker]`
          • site.yaml:
            ```
            ...
            bigtop::hadoop_head_node: my.fqdn
            hadoop_cluster_node::cluster_components:
          • spark-standalone
            bigtop::roles_enabled: false
            ...
            ```
          • Using components where the worker is not on the head node
          • `['deploy::roles']` is `[spark-worker]`
          • site.yaml:
            ```
            ...
            bigtop::hadoop_head_node: NOT.my.fqdn
            hadoop_cluster_node::cluster_components:
          • spark-standalone
            bigtop::roles_enabled: false
            ...
            ```
          • Using roles
          • `['deploy::roles']` is `[spark-master, spark-worker]`
          • site.yaml:
            ```
            ...
            bigtop::roles:
          • spark-master
          • spark-worker
            bigtop::roles_enabled: true
            ...
            ```

          So I think the proper condition here (and in `spark-env.sh` below) is to check if `scope['deploy::roles']` includes either `spark-master` or `spark-worker` instead of checking for `spark-standalone`.

          Show
          githubbot ASF GitHub Bot added a comment - Github user kwmonroe commented on a diff in the pull request: https://github.com/apache/bigtop/pull/208#discussion_r116296932 — Diff: bigtop-deploy/puppet/modules/spark/templates/spark-defaults.conf — @@ -13,7 +13,15 @@ See the License for the specific language governing permissions and limitations under the License. +<% if @master_url -%> spark.master <%= @master_url %> +<% else -%> +<% if scope ['deploy::roles'] .include? 'spark-standalone' -%> — End diff – @evans-ye sorry i didn't catch this earlier (i still get confused with roles vs components sometimes). However, I think this condition isn't quite right. The * component * is called `spark-standalone`, while the * role * that should be handled here is `spark-master` or `spark-worker`. I checked this a few different ways... Using components on a single head node: ` ['deploy::roles'] ` is ` [spark-master, spark-worker] ` site.yaml: ``` ... bigtop::hadoop_head_node: my.fqdn hadoop_cluster_node::cluster_components: spark-standalone bigtop::roles_enabled: false ... ``` Using components where the worker is not on the head node ` ['deploy::roles'] ` is ` [spark-worker] ` site.yaml: ``` ... bigtop::hadoop_head_node: NOT.my.fqdn hadoop_cluster_node::cluster_components: spark-standalone bigtop::roles_enabled: false ... ``` Using roles ` ['deploy::roles'] ` is ` [spark-master, spark-worker] ` site.yaml: ``` ... bigtop::roles: spark-master spark-worker bigtop::roles_enabled: true ... ``` So I think the proper condition here (and in `spark-env.sh` below) is to check if `scope ['deploy::roles'] ` includes either `spark-master` or `spark-worker` instead of checking for `spark-standalone`.
          Hide
          evans_ye Evans Ye added a comment -

          Updated based on the discussion in git PR.

          Show
          evans_ye Evans Ye added a comment - Updated based on the discussion in git PR.
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user kwmonroe commented on a diff in the pull request:

          https://github.com/apache/bigtop/pull/208#discussion_r116045104

          — Diff: bigtop-deploy/puppet/modules/spark/templates/spark-defaults.conf —
          @@ -13,7 +13,15 @@

          1. See the License for the specific language governing permissions and
          2. limitations under the License.

          +<% if @master_url -%>
          spark.master <%= @master_url %>
          +<% else -%>
          +<% if scope['deploy::roles'].include? 'spark-on-yarn' -%>
          +spark.master yarn
          +<% else -%>
          +spark.master spark://<%= @master_host %>:<%= @master_port %>
          — End diff –

          Yup @evans-ye, I agree the `spark-master|worker` roles implied standalone, yet it was always configured for yarn by default. Confusing for sure.

          I like your idea of keeping it backwards compat and introducing the new `spark-standalone`. With that change, I imagine the templates patch would change to something like this:

          ```diff
          +<% if @master_url -%>
          spark.master <%= @master_url %>
          +<% else -%>
          +<% if scope['deploy::roles'].include? 'spark-standalone' -%>
          +spark.master spark://<%= @master_host %>:<%= @master_port %>
          +<% else -%>
          +spark.master yarn
          ```

          Show
          githubbot ASF GitHub Bot added a comment - Github user kwmonroe commented on a diff in the pull request: https://github.com/apache/bigtop/pull/208#discussion_r116045104 — Diff: bigtop-deploy/puppet/modules/spark/templates/spark-defaults.conf — @@ -13,7 +13,15 @@ See the License for the specific language governing permissions and limitations under the License. +<% if @master_url -%> spark.master <%= @master_url %> +<% else -%> +<% if scope ['deploy::roles'] .include? 'spark-on-yarn' -%> +spark.master yarn +<% else -%> +spark.master spark://<%= @master_host %>:<%= @master_port %> — End diff – Yup @evans-ye, I agree the `spark-master|worker` roles implied standalone, yet it was always configured for yarn by default. Confusing for sure. I like your idea of keeping it backwards compat and introducing the new `spark-standalone`. With that change, I imagine the templates patch would change to something like this: ```diff +<% if @master_url -%> spark.master <%= @master_url %> +<% else -%> +<% if scope ['deploy::roles'] .include? 'spark-standalone' -%> +spark.master spark://<%= @master_host %>:<%= @master_port %> +<% else -%> +spark.master yarn ```
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user evans-ye commented on a diff in the pull request:

          https://github.com/apache/bigtop/pull/208#discussion_r116042957

          — Diff: bigtop-deploy/puppet/modules/spark/templates/spark-defaults.conf —
          @@ -13,7 +13,15 @@

          1. See the License for the specific language governing permissions and
          2. limitations under the License.

          +<% if @master_url -%>
          spark.master <%= @master_url %>
          +<% else -%>
          +<% if scope['deploy::roles'].include? 'spark-on-yarn' -%>
          +spark.master yarn
          +<% else -%>
          +spark.master spark://<%= @master_host %>:<%= @master_port %>
          — End diff –

          I think this can be thoroughly discussed because backward compatibility is indeed important.

          I did it this way just because the logic deploying spark (spark-master, spark-worker) implies it's a standalone deployment. However it happens to be working spark on yarn instead.

          How about this way:
          spark remains deploying yarn, and then I'll add another component call spark-standalone:
          The code change will be looked like this:

          ```
          spark =>

          { worker => ["spark-on-yarn"], client => ["spark-client"], library => ["spark-yarn-slave"], }

          spark-standalone =>

          { master => ["spark-master"], worker => ["spark-worker"], }

          ```

          Show
          githubbot ASF GitHub Bot added a comment - Github user evans-ye commented on a diff in the pull request: https://github.com/apache/bigtop/pull/208#discussion_r116042957 — Diff: bigtop-deploy/puppet/modules/spark/templates/spark-defaults.conf — @@ -13,7 +13,15 @@ See the License for the specific language governing permissions and limitations under the License. +<% if @master_url -%> spark.master <%= @master_url %> +<% else -%> +<% if scope ['deploy::roles'] .include? 'spark-on-yarn' -%> +spark.master yarn +<% else -%> +spark.master spark://<%= @master_host %>:<%= @master_port %> — End diff – I think this can be thoroughly discussed because backward compatibility is indeed important. I did it this way just because the logic deploying spark (spark-master, spark-worker) implies it's a standalone deployment. However it happens to be working spark on yarn instead. How about this way: spark remains deploying yarn, and then I'll add another component call spark-standalone: The code change will be looked like this: ``` spark => { worker => ["spark-on-yarn"], client => ["spark-client"], library => ["spark-yarn-slave"], } spark-standalone => { master => ["spark-master"], worker => ["spark-worker"], } ```
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user evans-ye commented on a diff in the pull request:

          https://github.com/apache/bigtop/pull/208#discussion_r116040559

          — Diff: bigtop-deploy/puppet/manifests/cluster.pp —
          @@ -72,6 +72,10 @@
          master => ["spark-master"],
          worker => ["spark-worker"],
          },
          + spark-on-yarn =>

          { + worker => ["spark-on-yarn", "spark-slave"], + client => ["spark-client"], + }

          ,
          — End diff –

          Very informative input. Thanks Kevin!

          Show
          githubbot ASF GitHub Bot added a comment - Github user evans-ye commented on a diff in the pull request: https://github.com/apache/bigtop/pull/208#discussion_r116040559 — Diff: bigtop-deploy/puppet/manifests/cluster.pp — @@ -72,6 +72,10 @@ master => ["spark-master"] , worker => ["spark-worker"] , }, + spark-on-yarn => { + worker => ["spark-on-yarn", "spark-slave"], + client => ["spark-client"], + } , — End diff – Very informative input. Thanks Kevin!
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user evans-ye commented on the issue:

          https://github.com/apache/bigtop/pull/208

          Thanks for mentioning here. I intentionally left the include part out of this PR to avoid dup work in BIGTOP-2764. Since I'll rebase anyway, I suggest you to submit a patch for BIGTOP-2764, and I'll
          rebase on it.

          Show
          githubbot ASF GitHub Bot added a comment - Github user evans-ye commented on the issue: https://github.com/apache/bigtop/pull/208 Thanks for mentioning here. I intentionally left the include part out of this PR to avoid dup work in BIGTOP-2764 . Since I'll rebase anyway, I suggest you to submit a patch for BIGTOP-2764 , and I'll rebase on it.
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user kwmonroe commented on the issue:

          https://github.com/apache/bigtop/pull/208

          @evans-ye while you're updating spark's init.pp, would you consider replacing the `spark-datanucleus` definition with an `import spark::datanucleus`?

          https://github.com/evans-ye/bigtop/blob/aeb72aa14ecc3363723dce542f631aea2faaba41/bigtop-deploy/puppet/modules/spark/manifests/init.pp#L158

          As you noted in BIGTOP-2764, including the class seems better than redefining the package.

          Show
          githubbot ASF GitHub Bot added a comment - Github user kwmonroe commented on the issue: https://github.com/apache/bigtop/pull/208 @evans-ye while you're updating spark's init.pp, would you consider replacing the `spark-datanucleus` definition with an `import spark::datanucleus`? https://github.com/evans-ye/bigtop/blob/aeb72aa14ecc3363723dce542f631aea2faaba41/bigtop-deploy/puppet/modules/spark/manifests/init.pp#L158 As you noted in BIGTOP-2764 , including the class seems better than redefining the package.
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user kwmonroe commented on a diff in the pull request:

          https://github.com/apache/bigtop/pull/208#discussion_r116011550

          — Diff: bigtop-deploy/puppet/modules/spark/templates/spark-defaults.conf —
          @@ -13,7 +13,15 @@

          1. See the License for the specific language governing permissions and
          2. limitations under the License.

          +<% if @master_url -%>
          spark.master <%= @master_url %>
          +<% else -%>
          +<% if scope['deploy::roles'].include? 'spark-on-yarn' -%>
          +spark.master yarn
          +<% else -%>
          +spark.master spark://<%= @master_host %>:<%= @master_port %>
          — End diff –

          This block generally LGTM, but note that this is effectively making `spark://host:port` the default master. Prior to this change, people deploying the `spark` component would get `yarn` as the master by default.

          So I'm +1 here, but it may be worth mentioning on-list or in the release notes that the default spark configuration is changing.

          Show
          githubbot ASF GitHub Bot added a comment - Github user kwmonroe commented on a diff in the pull request: https://github.com/apache/bigtop/pull/208#discussion_r116011550 — Diff: bigtop-deploy/puppet/modules/spark/templates/spark-defaults.conf — @@ -13,7 +13,15 @@ See the License for the specific language governing permissions and limitations under the License. +<% if @master_url -%> spark.master <%= @master_url %> +<% else -%> +<% if scope ['deploy::roles'] .include? 'spark-on-yarn' -%> +spark.master yarn +<% else -%> +spark.master spark://<%= @master_host %>:<%= @master_port %> — End diff – This block generally LGTM, but note that this is effectively making `spark://host:port` the default master. Prior to this change, people deploying the `spark` component would get `yarn` as the master by default. So I'm +1 here, but it may be worth mentioning on-list or in the release notes that the default spark configuration is changing.
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user kwmonroe commented on a diff in the pull request:

          https://github.com/apache/bigtop/pull/208#discussion_r116002658

          — Diff: bigtop-deploy/puppet/manifests/cluster.pp —
          @@ -72,6 +72,10 @@
          master => ["spark-master"],
          worker => ["spark-worker"],
          },
          + spark-on-yarn =>

          { + worker => ["spark-on-yarn", "spark-slave"], + client => ["spark-client"], + }

          ,
          — End diff –

          Small typo above. `spark-slave` should be `spark-yarn-slave` (see [init.pp](https://github.com/apache/bigtop/blob/master/bigtop-deploy/puppet/modules/spark/manifests/init.pp#L27)).

          Also, while it's probably inconsequential, `spark-yarn-slave` isn't really a worker – it just provides the yarn shuffle jar for spark. So I think the above block should read:

          ```diff
          + spark-on-yarn =>

          { + worker => ["spark-on-yarn"], + client => ["spark-client"], + library => ["spark-yarn-slave"], + }

          ,

          Show
          githubbot ASF GitHub Bot added a comment - Github user kwmonroe commented on a diff in the pull request: https://github.com/apache/bigtop/pull/208#discussion_r116002658 — Diff: bigtop-deploy/puppet/manifests/cluster.pp — @@ -72,6 +72,10 @@ master => ["spark-master"] , worker => ["spark-worker"] , }, + spark-on-yarn => { + worker => ["spark-on-yarn", "spark-slave"], + client => ["spark-client"], + } , — End diff – Small typo above. `spark-slave` should be `spark-yarn-slave` (see [init.pp] ( https://github.com/apache/bigtop/blob/master/bigtop-deploy/puppet/modules/spark/manifests/init.pp#L27 )). Also, while it's probably inconsequential, `spark-yarn-slave` isn't really a worker – it just provides the yarn shuffle jar for spark. So I think the above block should read: ```diff + spark-on-yarn => { + worker => ["spark-on-yarn"], + client => ["spark-client"], + library => ["spark-yarn-slave"], + } ,
          Hide
          evans_ye Evans Ye added a comment -

          Kevin W Monroe Jonathan Kelly can I get some advices from you?
          Seems you are experienced on spark and deployment stuff.

          Show
          evans_ye Evans Ye added a comment - Kevin W Monroe Jonathan Kelly can I get some advices from you? Seems you are experienced on spark and deployment stuff.
          Hide
          evans_ye Evans Ye added a comment -

          The patch does the following X things:

          • Add spark-on-yarn component to let user explicitly specify spark deploy mode
          • Re-organize site.yaml
          • Add a logic in spark-env.sh as well as spark-default.conf to switch the deployment mode
            if { 
                master_url specified, then use it
            } else {
                if spark-on-yarn in roles {
                    master_url = yarn
                } else {
                    master_url = spark://master_host:master_port
                }
            }
            
          Show
          evans_ye Evans Ye added a comment - The patch does the following X things: Add spark-on-yarn component to let user explicitly specify spark deploy mode Re-organize site.yaml Add a logic in spark-env.sh as well as spark-default.conf to switch the deployment mode if { master_url specified, then use it } else { if spark-on-yarn in roles { master_url = yarn } else { master_url = spark: //master_host:master_port } }
          Hide
          githubbot ASF GitHub Bot added a comment -

          GitHub user evans-ye opened a pull request:

          https://github.com/apache/bigtop/pull/208

          BIGTOP-2766. [Puppet] Spark worker startup failed due to default master_url is yarn

          You can merge this pull request into a Git repository by running:

          $ git pull https://github.com/evans-ye/bigtop BIGTOP-2766

          Alternatively you can review and apply these changes as the patch at:

          https://github.com/apache/bigtop/pull/208.patch

          To close this pull request, make a commit to your master/trunk branch
          with (at least) the following in the commit message:

          This closes #208


          commit 8e1d7b8f3e34f6f4e59508ad453ef8f3b30e924b
          Author: Evans Ye <evansye@apache.org>
          Date: 2017-05-10T19:23:19Z

          BIGTOP-2766. [Puppet] Spark worker startup failed due to default master_url is yarn


          Show
          githubbot ASF GitHub Bot added a comment - GitHub user evans-ye opened a pull request: https://github.com/apache/bigtop/pull/208 BIGTOP-2766 . [Puppet] Spark worker startup failed due to default master_url is yarn You can merge this pull request into a Git repository by running: $ git pull https://github.com/evans-ye/bigtop BIGTOP-2766 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/bigtop/pull/208.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #208 commit 8e1d7b8f3e34f6f4e59508ad453ef8f3b30e924b Author: Evans Ye <evansye@apache.org> Date: 2017-05-10T19:23:19Z BIGTOP-2766 . [Puppet] Spark worker startup failed due to default master_url is yarn

            People

            • Assignee:
              evans_ye Evans Ye
              Reporter:
              evans_ye Evans Ye
            • Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development