Details

    • Type: Bug
    • Status: Resolved
    • Priority: Critical
    • Resolution: Fixed
    • Affects Version/s: 1.2.0
    • Fix Version/s: 1.2.1, 1.3.0
    • Component/s: spark
    • Labels:
      None

      Description

      Problem is caused by BIGTOP-2490 that assumes only puppet managed orchestration. Thus, breaks other method of deployment .
      Cause of the problem:
      File bigtop-packages/src/common/spark/spark-worker.svc was changed to look for value $SPARK_MASTER_URL (line 47). This value is defined only in file spark-env.sh (bigtop-deploy/puppet/modules/spark/templates/spark-env.sh) that gets only initialized during puppet deployment.

      https://github.com/apache/bigtop/commit/f627c7868cb1123f6595380a7325f538ebdb8913#diff-90a09a528d21e12a96d209ffc4036f1e

      I propose to remove BIGTOP-2490 and update Bigtop 1.2 repository asap, Hopefully we will have a proper fix soon.

      1. BIGTOP-2738.patch
        1.0 kB
        Amir Sanjar

        Activity

        Hide
        evans_ye Evans Ye added a comment -

        Amir Sanjar I've created and documented this on our wiki.

        Show
        evans_ye Evans Ye added a comment - Amir Sanjar I've created and documented this on our wiki .
        Hide
        evans_ye Evans Ye added a comment -

        Plus BIGTOP-2740, the weight for doing 1.2.1 is uplifted.
        Let me try to add integration test for ppc in our CI.

        Show
        evans_ye Evans Ye added a comment - Plus BIGTOP-2740 , the weight for doing 1.2.1 is uplifted. Let me try to add integration test for ppc in our CI.
        Hide
        asanjar Amir Sanjar added a comment -

        +100 on release 1.2.1.. changing spark-env.sh on every node of cluster is not for faint-hearted

        Show
        asanjar Amir Sanjar added a comment - +100 on release 1.2.1.. changing spark-env.sh on every node of cluster is not for faint-hearted
        Hide
        rvs Roman Shaposhnik added a comment -

        Evans Ye if you could start a "Known Issues" page on our wiki it would be super appreciated.

        Also, cutting Bigtop 1.2.1 is possible and I already have a few candidates myself (around GPDB for example).

        Show
        rvs Roman Shaposhnik added a comment - Evans Ye if you could start a "Known Issues" page on our wiki it would be super appreciated. Also, cutting Bigtop 1.2.1 is possible and I already have a few candidates myself (around GPDB for example).
        Hide
        evans_ye Evans Ye added a comment -

        I guess we still need to clearly state this as a known issue and provide the workaround. wiki will be a good place to go for this kind of things.
        Let me add Bigtop 1.2.0 Release tab with this JIRA.

        Show
        evans_ye Evans Ye added a comment - I guess we still need to clearly state this as a known issue and provide the workaround. wiki will be a good place to go for this kind of things. Let me add Bigtop 1.2.0 Release tab with this JIRA.
        Hide
        cos Konstantin Boudnik added a comment -

        I am not arguing if the problem is real or not. I got confused because for me the 'deployment' goes beyond 'package install'. Hence were my questions and all. I think the proposed fix makes all the sense.

        Show
        cos Konstantin Boudnik added a comment - I am not arguing if the problem is real or not. I got confused because for me the 'deployment' goes beyond 'package install'. Hence were my questions and all. I think the proposed fix makes all the sense.
        Hide
        evans_ye Evans Ye added a comment -

        +1 to Amir Sanjar's proposal.

        Show
        evans_ye Evans Ye added a comment - +1 to Amir Sanjar 's proposal.
        Hide
        evans_ye Evans Ye added a comment -

        IMHO Puppet should not be the only way we deploy our packages. For example, one should be able to just yum install or apt-get install spark and get it up running in pseudo cluster mode.
        So let me reframe the problem. I see the problem here Amir Sanjar brought up is:
        we're using a self-defined variable which has no where to lookup unless one dig into our code.
        Possible fix is to add SPARK_MASTER_URL in our packaged spark-env.sh with default value.

        Show
        evans_ye Evans Ye added a comment - IMHO Puppet should not be the only way we deploy our packages. For example, one should be able to just yum install or apt-get install spark and get it up running in pseudo cluster mode. So let me reframe the problem. I see the problem here Amir Sanjar brought up is: we're using a self-defined variable which has no where to lookup unless one dig into our code. Possible fix is to add SPARK_MASTER_URL in our packaged spark-env.sh with default value.
        Hide
        cos Konstantin Boudnik added a comment -

        Ah, that makes sense.
        I don't think this warrants redoing of 1.2 repos or respinning of the release because there's a clear workaround. But surely this needs to be fixed.

        Show
        cos Konstantin Boudnik added a comment - Ah, that makes sense. I don't think this warrants redoing of 1.2 repos or respinning of the release because there's a clear workaround. But surely this needs to be fixed.
        Hide
        asanjar Amir Sanjar added a comment - - edited

        the issue is caused by change in line 47:
        su -s /bin/bash $SVC_USER -c "nohup nice -n 0 \

        • $ {EXEC_PATH} org.apache.spark.deploy.worker.Worker spark://$SPARK_MASTER_IP:$SPARK_MASTER_PORT $DAEMON_FLAGS \ <<< works
          + ${EXEC_PATH}

          org.apache.spark.deploy.worker.Worker $SPARK_MASTER_URL $DAEMON_FLAGS \ <<< breaks
          > $LOG_FILE 2>&1 & "'echo $!' > "$PIDFILE"

        During default installation (i.e apt-get install spark-worker), different spark-env.sh https://github.com/apache/bigtop/blob/master/bigtop-packages/src/common/spark/spark-env.sh gets installed.
        I recommand to append "export SPARK_MASTER_URL=spark://$SPARK_MASTER_IP:$SPARK_MASTER_PORT" to https://github.com/apache/bigtop/blob/master/bigtop-packages/src/common/spark/spark-env.sh, that should fix the problem.
        I could submit a patch, but I am at a customer site at the moment.

        Show
        asanjar Amir Sanjar added a comment - - edited the issue is caused by change in line 47: su -s /bin/bash $SVC_USER -c "nohup nice -n 0 \ $ {EXEC_PATH} org.apache.spark.deploy.worker.Worker spark://$SPARK_MASTER_IP:$SPARK_MASTER_PORT $DAEMON_FLAGS \ <<< works + ${EXEC_PATH} org.apache.spark.deploy.worker.Worker $SPARK_MASTER_URL $DAEMON_FLAGS \ <<< breaks > $LOG_FILE 2>&1 & "'echo $!' > "$PIDFILE" During default installation (i.e apt-get install spark-worker), different spark-env.sh https://github.com/apache/bigtop/blob/master/bigtop-packages/src/common/spark/spark-env.sh gets installed. I recommand to append "export SPARK_MASTER_URL=spark://$SPARK_MASTER_IP:$SPARK_MASTER_PORT" to https://github.com/apache/bigtop/blob/master/bigtop-packages/src/common/spark/spark-env.sh , that should fix the problem. I could submit a patch, but I am at a customer site at the moment.
        Hide
        cos Konstantin Boudnik added a comment -

        Even before the change in question, we were configuring Spark with
        export STANDALONE_SPARK_MASTER_HOST=<%= @master_host %>

        So, if you wanted to deploy the component without the use of Puppet, you'd fall the same way as well. I don't see how BIGTOP-2490 has broken anything

        Show
        cos Konstantin Boudnik added a comment - Even before the change in question, we were configuring Spark with export STANDALONE_SPARK_MASTER_HOST=<%= @master_host %> So, if you wanted to deploy the component without the use of Puppet, you'd fall the same way as well. I don't see how BIGTOP-2490 has broken anything
        Hide
        cos Konstantin Boudnik added a comment -

        What other means of the deployment you have in mind Amir Sanjar? Bigtop only works with one, effectively. And IIRC spark_env.sh is the standard piece of how spark environment is getting configured. It can be created via different means if you chosen not to use Puppet route.

        Show
        cos Konstantin Boudnik added a comment - What other means of the deployment you have in mind Amir Sanjar ? Bigtop only works with one, effectively. And IIRC spark_env.sh is the standard piece of how spark environment is getting configured. It can be created via different means if you chosen not to use Puppet route.

          People

          • Assignee:
            asanjar Amir Sanjar
            Reporter:
            asanjar Amir Sanjar
          • Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

            • Due:
              Created:
              Updated:
              Resolved:

              Development