Details

    • Type: Bug
    • Status: Closed
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: 1.1.0
    • Fix Version/s: 1.2.0
    • Component/s: build
    • Labels:
      None

      Description

      Anyone interested in spark 2.0.x ? Here is an inital patch (for centos only).

      1. spark20.patch
        7 kB
        Olaf Flebbe
      2. BIGTOP-2569.patch
        17 kB
        Jonathan Kelly
      3. BIGTOP-2569.patch
        17 kB
        Jonathan Kelly

        Activity

        Hide
        oflebbe Olaf Flebbe added a comment -

        Initial Patch

        Show
        oflebbe Olaf Flebbe added a comment - Initial Patch
        Hide
        oflebbe Olaf Flebbe added a comment -

        Feel free to contribute to this patch and make it ready for adding it to Bigtop

        Show
        oflebbe Olaf Flebbe added a comment - Feel free to contribute to this patch and make it ready for adding it to Bigtop
        Hide
        asanjar Amir Sanjar added a comment -

        Great, could we create a new bigtop component called spark2? It would be helpful for users to have support for both spark 1.6.2 and spark 2.0 in bigtop.

        Show
        asanjar Amir Sanjar added a comment - Great, could we create a new bigtop component called spark2? It would be helpful for users to have support for both spark 1.6.2 and spark 2.0 in bigtop.
        Hide
        jonathak Jonathan Kelly added a comment -

        I actually worked on adding Spark 2.0 to Bigtop for EMR a few months ago, and I could probably contribute this patch.

        Show
        jonathak Jonathan Kelly added a comment - I actually worked on adding Spark 2.0 to Bigtop for EMR a few months ago, and I could probably contribute this patch.
        Hide
        rvs Roman Shaposhnik added a comment -

        Jonathan Kelly that'd be very much appreciated!

        Show
        rvs Roman Shaposhnik added a comment - Jonathan Kelly that'd be very much appreciated!
        Hide
        jonathak Jonathan Kelly added a comment -

        OK, here's my patch to upgrade to Spark 2.0.1. It includes some related changes such as renaming the spark-extras package to spark-external and also adding a new /usr/bin/spark-example script that passes through to /usr/lib/spark/bin/run-example.

        Show
        jonathak Jonathan Kelly added a comment - OK, here's my patch to upgrade to Spark 2.0.1. It includes some related changes such as renaming the spark-extras package to spark-external and also adding a new /usr/bin/spark-example script that passes through to /usr/lib/spark/bin/run-example.
        Hide
        jonathak Jonathan Kelly added a comment -

        Note: I also made changes for Debian but am not really set up to test it out on Debian myself.

        Show
        jonathak Jonathan Kelly added a comment - Note: I also made changes for Debian but am not really set up to test it out on Debian myself.
        Hide
        oflebbe Olaf Flebbe added a comment -

        Hi Jonathan Kelly ! Thank you very much for the patch!

        I did a quick run on debian and looked at the lintian output. There are two issues, which might need special attention:

        1) sparc-core contains /var/run/spark, which cannot work, since /var/run is a volatile ram disk.
        2) sparc-core contains the hadoop libraries. IMHO we should try to generate a hadoop-free distribution and use our Hadoop like this: http://spark.apache.org/docs/latest/hadoop-provided.html . What do you think?

        I will try the spark packages later and will look out for further issues.

        Show
        oflebbe Olaf Flebbe added a comment - Hi Jonathan Kelly ! Thank you very much for the patch! I did a quick run on debian and looked at the lintian output. There are two issues, which might need special attention: 1) sparc-core contains /var/run/spark, which cannot work, since /var/run is a volatile ram disk. 2) sparc-core contains the hadoop libraries. IMHO we should try to generate a hadoop-free distribution and use our Hadoop like this: http://spark.apache.org/docs/latest/hadoop-provided.html . What do you think? I will try the spark packages later and will look out for further issues.
        Hide
        jonathak Jonathan Kelly added a comment -

        Thanks for taking a look. Both of these issues are pre-existing though, right? So should they be fixed in separate JIRA issues?

        Show
        jonathak Jonathan Kelly added a comment - Thanks for taking a look. Both of these issues are pre-existing though, right? So should they be fixed in separate JIRA issues?
        Hide
        asanjar Amir Sanjar added a comment -

        great progress, I'm assuming we are planing to distribute both Spark 1.6.2 and Spark 2.x, correct?

        Show
        asanjar Amir Sanjar added a comment - great progress, I'm assuming we are planing to distribute both Spark 1.6.2 and Spark 2.x, correct?
        Hide
        warwithin YoungWoo Kim added a comment -

        If we maintain Spark 1.6 and Spark 2.0 at the same time, we should add spark1 to BOM? In the case of Sqoop, the sqoop packages are current stable 1.x artifacts and sqoop2 is 1.99.x (alpha) It looks like 2.0.1 is the latest stable version for Spark.

        Show
        warwithin YoungWoo Kim added a comment - If we maintain Spark 1.6 and Spark 2.0 at the same time, we should add spark1 to BOM? In the case of Sqoop, the sqoop packages are current stable 1.x artifacts and sqoop2 is 1.99.x (alpha) It looks like 2.0.1 is the latest stable version for Spark.
        Hide
        warwithin YoungWoo Kim added a comment -

        Yep. I think so.

        Show
        warwithin YoungWoo Kim added a comment - Yep. I think so.
        Hide
        asanjar Amir Sanjar added a comment -

        would it be easier to create spark2, if not, let me know how and what needed to be done to create spark1. with some guidance I could take the ownership of that.

        Show
        asanjar Amir Sanjar added a comment - would it be easier to create spark2, if not, let me know how and what needed to be done to create spark1. with some guidance I could take the ownership of that.
        Hide
        oflebbe Olaf Flebbe added a comment -

        Ok.

        Show
        oflebbe Olaf Flebbe added a comment - Ok.
        Hide
        oflebbe Olaf Flebbe added a comment - - edited

        I prefer to leave spark (1.x) as is and introduce a new package spark2, if this is consensus here. (I am a spark noop, only played with spark2).

        If this does not make sense to handle spark (version1) and spark2 differently, anyone please raise your concern here.

        Jonathan Kelly] : In order to get your patch correctly attributed to your account: It would be great if you can change the patch to create a new package "spark2" rather to modify spark If nobody objects I will commit it asap.

        Show
        oflebbe Olaf Flebbe added a comment - - edited I prefer to leave spark (1.x) as is and introduce a new package spark2, if this is consensus here. (I am a spark noop, only played with spark2). If this does not make sense to handle spark (version1) and spark2 differently, anyone please raise your concern here. Jonathan Kelly ] : In order to get your patch correctly attributed to your account: It would be great if you can change the patch to create a new package "spark2" rather to modify spark If nobody objects I will commit it asap.
        Hide
        asanjar Amir Sanjar added a comment -

        +1 to spark(v1) and spark2(v2.x)

        Show
        asanjar Amir Sanjar added a comment - +1 to spark(v1) and spark2(v2.x)
        Hide
        jonathak Jonathan Kelly added a comment -

        I'm not sure I really like keeping Spark 1.x and Spark 2.x together in Bigtop. What is the benefit of this? Would anybody really want to deploy both Spark 1.x and Spark 2.x on the same cluster? If not, then can't they just use an older version of Bigtop to deploy Spark 1.x and a newer version of Bigtop to deploy Spark 2.x? If we were to keep both alongside each other, would Spark 2.x have to be installed in /usr/lib/spark2? That seems less than ideal because then it would have to remain like that forever in order to not break customers who are expecting Spark to be in /usr/lib/spark.

        FWIW, EMR chose not to leave Spark 1.x alongside Spark 2.x but rather just to upgrade completely to Spark 2.x (as of emr-5.0.0), which is still installed in /usr/lib/spark. Since customers are already accustomed to expecting Spark to be in /usr/lib/spark whether it's Spark 1.x (on an emr-4.x cluster) or Spark 2.x (on an emr-5.x cluster), it would be a pain to have it change to /usr/lib/spark2.

        Another alternative might be to have both Spark 1.x and Spark 2.x use the same paths (/usr/lib/spark, /var/lib/spark, etc.) and to allow only either Spark 1.x or Spark 2.x to be installed on a cluster. What do you think?

        Show
        jonathak Jonathan Kelly added a comment - I'm not sure I really like keeping Spark 1.x and Spark 2.x together in Bigtop. What is the benefit of this? Would anybody really want to deploy both Spark 1.x and Spark 2.x on the same cluster? If not, then can't they just use an older version of Bigtop to deploy Spark 1.x and a newer version of Bigtop to deploy Spark 2.x? If we were to keep both alongside each other, would Spark 2.x have to be installed in /usr/lib/spark2? That seems less than ideal because then it would have to remain like that forever in order to not break customers who are expecting Spark to be in /usr/lib/spark. FWIW, EMR chose not to leave Spark 1.x alongside Spark 2.x but rather just to upgrade completely to Spark 2.x (as of emr-5.0.0), which is still installed in /usr/lib/spark. Since customers are already accustomed to expecting Spark to be in /usr/lib/spark whether it's Spark 1.x (on an emr-4.x cluster) or Spark 2.x (on an emr-5.x cluster), it would be a pain to have it change to /usr/lib/spark2. Another alternative might be to have both Spark 1.x and Spark 2.x use the same paths (/usr/lib/spark, /var/lib/spark, etc.) and to allow only either Spark 1.x or Spark 2.x to be installed on a cluster. What do you think?
        Hide
        rvs Roman Shaposhnik added a comment -

        I am with Jonathan Kelly I think we should ditch the current Spark and move to Spark 2. If somebody is really interested in Spark 1 they can alway re-introduce spark-1 package later. Makes sense?

        Show
        rvs Roman Shaposhnik added a comment - I am with Jonathan Kelly I think we should ditch the current Spark and move to Spark 2. If somebody is really interested in Spark 1 they can alway re-introduce spark-1 package later. Makes sense?
        Hide
        warwithin YoungWoo Kim added a comment -

        Spark 2.x is current stable version for Spark. I'm really strange to have suffix or prefix in their packages or directory for stable version. besides, I believe users do not need both 1.x and 2.x on the same cluster. IMO, we should move to 2.x for 'spark'. Thanks!

        Show
        warwithin YoungWoo Kim added a comment - Spark 2.x is current stable version for Spark. I'm really strange to have suffix or prefix in their packages or directory for stable version. besides, I believe users do not need both 1.x and 2.x on the same cluster. IMO, we should move to 2.x for 'spark'. Thanks!
        Hide
        asanjar Amir Sanjar added a comment -

        Spark 2.0 is still NOT stable for other platforms (i.e. ppc64le). btw, Hortonworks has both sparks available as part of HDP 2.5.

        Show
        asanjar Amir Sanjar added a comment - Spark 2.0 is still NOT stable for other platforms (i.e. ppc64le). btw, Hortonworks has both sparks available as part of HDP 2.5.
        Hide
        rvs Roman Shaposhnik added a comment -

        Amir Sanjar I get your point, but I'd rather somebody create spark-1 for those platforms. It looks to me that in a regular kind of a setup (Linux on x86) everybody's using Spark2 and I think we should reflect it in Bigtop.

        Show
        rvs Roman Shaposhnik added a comment - Amir Sanjar I get your point, but I'd rather somebody create spark-1 for those platforms. It looks to me that in a regular kind of a setup (Linux on x86) everybody's using Spark2 and I think we should reflect it in Bigtop.
        Hide
        jonathak Jonathan Kelly added a comment -

        Is anybody opposed to my patch to upgrade Spark from 1.x to 2.x being committed, and then if the demand is really there for Spark 1.x to remain in current versions of Bigtop, somebody adds Spark 1.x (as "spark1") back to Bigtop alongside Spark 2.x (as "spark")?

        Show
        jonathak Jonathan Kelly added a comment - Is anybody opposed to my patch to upgrade Spark from 1.x to 2.x being committed, and then if the demand is really there for Spark 1.x to remain in current versions of Bigtop, somebody adds Spark 1.x (as "spark1") back to Bigtop alongside Spark 2.x (as "spark")?
        Hide
        oflebbe Olaf Flebbe added a comment -

        Hi Jonathan Kelly, I think we can proceed this way. Will commit your patch tomorrow, unless someone is faster than me.

        Show
        oflebbe Olaf Flebbe added a comment - Hi Jonathan Kelly , I think we can proceed this way. Will commit your patch tomorrow, unless someone is faster than me.
        Hide
        jonathak Jonathan Kelly added a comment -

        Thank you!

        Show
        jonathak Jonathan Kelly added a comment - Thank you!
        Hide
        asanjar Amir Sanjar added a comment -

        could we at least delay this commit till I get Spark1 component done..

        Show
        asanjar Amir Sanjar added a comment - could we at least delay this commit till I get Spark1 component done..
        Hide
        oflebbe Olaf Flebbe added a comment - - edited

        Hi Amir something like this in bigtop.bom

        'spark1' {
              name    = 'spark1'
              pkg     = 'spark-core'
              relNotes = 'Apache Spark'
              version { base = '1.6.2'; pkg = base; release = 1 }
              tarball { destination = "spark-${version.base}.tar.gz"
                        source      = "spark-${version.base}.tgz" }
              url     { download_path = "/spark/spark-${version.base}"
                        site = "${apache.APACHE_MIRROR}/${download_path}"
                        archive = "${apache.APACHE_ARCHIVE}/${download_path}" }
            }
        

        and duplicating bigtop-packages/src/

        {common,deb,rpm}

        /spark to spark1 should do it.

        Show
        oflebbe Olaf Flebbe added a comment - - edited Hi Amir something like this in bigtop.bom 'spark1' { name = 'spark1' pkg = 'spark-core' relNotes = 'Apache Spark' version { base = '1.6.2'; pkg = base; release = 1 } tarball { destination = "spark-${version.base}.tar.gz" source = "spark-${version.base}.tgz" } url { download_path = "/spark/spark-${version.base}" site = "${apache.APACHE_MIRROR}/${download_path}" archive = "${apache.APACHE_ARCHIVE}/${download_path}" } } and duplicating bigtop-packages/src/ {common,deb,rpm} /spark to spark1 should do it.
        Hide
        asanjar Amir Sanjar added a comment -

        will do, thanks Olaf.

        Show
        asanjar Amir Sanjar added a comment - will do, thanks Olaf.
        Hide
        oflebbe Olaf Flebbe added a comment - - edited

        Amir, why do you hijack this JIRA ?

        Could you please add a seperate JIRA depending on this for introducing spark1 ?

        Show
        oflebbe Olaf Flebbe added a comment - - edited Amir, why do you hijack this JIRA ? Could you please add a seperate JIRA depending on this for introducing spark1 ?
        Hide
        asanjar Amir Sanjar added a comment -

        I had no plans to hijack this JIRA, I simply expressed my concerns in which this JIRA is making Spark 2.0 a default spark.

        Show
        asanjar Amir Sanjar added a comment - I had no plans to hijack this JIRA, I simply expressed my concerns in which this JIRA is making Spark 2.0 a default spark.
        Hide
        oflebbe Olaf Flebbe added a comment - - edited

        Amir, Sorry for my bad wording: I meant why are you changing the assignment of this JIRA from Jonathan to yourself ? Changing it back.

        Final tests.

        Show
        oflebbe Olaf Flebbe added a comment - - edited Amir, Sorry for my bad wording: I meant why are you changing the assignment of this JIRA from Jonathan to yourself ? Changing it back. Final tests.
        Hide
        jonathak Jonathan Kelly added a comment -

        That's part of what would need to be done, but I don't think it's quite everything. For instance, all of these new bigtop-packages/src/*/spark1 scripts, specs, etc., would need to be changed such that they are using Spark 1-.x-specific paths, package names, version variables, etc. (e.g., instead of referencing $SPARK_VERSION, it should use $SPARK1_VERSION, and instead of installing things to /usr/lib/spark, they should install to something like /usr/lib/spark1.) Also, what about duplicating the Puppet modules for Spark 1? Wouldn't you want that too? If so, you need even more changes to distinguish Spark 1.x from Spark 2.x.

        All of this complexity is why I was hoping not to have Spark 1.x and Spark 2.x colocated in the same version of Bigtop, but at least I'm happy with having the "main" (i.e., un-suffixed) version of Spark be Spark 2.x (instead of having "spark2").

        Show
        jonathak Jonathan Kelly added a comment - That's part of what would need to be done, but I don't think it's quite everything. For instance, all of these new bigtop-packages/src/*/spark1 scripts, specs, etc., would need to be changed such that they are using Spark 1-.x-specific paths, package names, version variables, etc. (e.g., instead of referencing $SPARK_VERSION, it should use $SPARK1_VERSION, and instead of installing things to /usr/lib/spark, they should install to something like /usr/lib/spark1.) Also, what about duplicating the Puppet modules for Spark 1? Wouldn't you want that too? If so, you need even more changes to distinguish Spark 1.x from Spark 2.x. All of this complexity is why I was hoping not to have Spark 1.x and Spark 2.x colocated in the same version of Bigtop, but at least I'm happy with having the "main" (i.e., un-suffixed) version of Spark be Spark 2.x (instead of having "spark2").
        Hide
        oflebbe Olaf Flebbe added a comment -

        Indeed. All I wanted is to have something handy to have spark Version 1 hacking somehow into ppc64le.

        We will get repository conflicts and so on. So I don't like to have spark1 build at default at all.

        puppet modules for that is out of scope.

        Show
        oflebbe Olaf Flebbe added a comment - Indeed. All I wanted is to have something handy to have spark Version 1 hacking somehow into ppc64le. We will get repository conflicts and so on. So I don't like to have spark1 build at default at all. puppet modules for that is out of scope.
        Hide
        oflebbe Olaf Flebbe added a comment -

        I am running out of time now while testing, I am preparing for flight to seville to ApacheCon .

        Show
        oflebbe Olaf Flebbe added a comment - I am running out of time now while testing, I am preparing for flight to seville to ApacheCon .
        Hide
        oflebbe Olaf Flebbe added a comment -

        centos-7 build stalls on my machine.

        Show
        oflebbe Olaf Flebbe added a comment - centos-7 build stalls on my machine.
        Hide
        asanjar Amir Sanjar added a comment - - edited

        Olaf, oooooh, I apologize, I didn't know the assignee was changed . the side effect of old age and working late

        Show
        asanjar Amir Sanjar added a comment - - edited Olaf, oooooh, I apologize, I didn't know the assignee was changed . the side effect of old age and working late
        Hide
        oflebbe Olaf Flebbe added a comment -

        Jonathan Kelly: Sorry to bother you. Amir's patch broke your's. I couldn't commit your patch in time.
        Would you mind rebasing your patch (1 line change) to match current situation and change the commit message to begin with "BIGTOP-2569: Spark 2.0 " in order to match the JIRA ? I will commit this instantly.

        Show
        oflebbe Olaf Flebbe added a comment - Jonathan Kelly : Sorry to bother you. Amir's patch broke your's. I couldn't commit your patch in time. Would you mind rebasing your patch (1 line change) to match current situation and change the commit message to begin with " BIGTOP-2569 : Spark 2.0 " in order to match the JIRA ? I will commit this instantly.
        Hide
        jonathak Jonathan Kelly added a comment -

        Rebased patch and corrected subject line.

        Show
        jonathak Jonathan Kelly added a comment - Rebased patch and corrected subject line.
        Hide
        jonathak Jonathan Kelly added a comment -

        Semi-related: I've submitted a patch to https://issues.apache.org/jira/browse/BIGTOP-2514 for upgrading Zeppelin to 0.6.2, which supports Spark 2.x.

        Show
        jonathak Jonathan Kelly added a comment - Semi-related: I've submitted a patch to https://issues.apache.org/jira/browse/BIGTOP-2514 for upgrading Zeppelin to 0.6.2, which supports Spark 2.x.
        Hide
        oflebbe Olaf Flebbe added a comment -

        Thank you very much Jonathan Kelly! Committed and closed.

        Show
        oflebbe Olaf Flebbe added a comment - Thank you very much Jonathan Kelly ! Committed and closed.
        Hide
        cos Konstantin Boudnik added a comment -

        And with this damn spark1 and spark2 decision now we bear the consequences.
        Clearly, consensus isn't the same as science ;(

        Show
        cos Konstantin Boudnik added a comment - And with this damn spark1 and spark2 decision now we bear the consequences. Clearly, consensus isn't the same as science ;(
        Hide
        jonathak Jonathan Kelly added a comment -

        Konstantin Boudnik, I'm not a huge fan of having both Spark 1 and Spark 2 concurrently either, but the problem is that not all applications that depend upon Spark in some way have a quick enough release cycle to start supporting Spark 2. We shouldn't be held back by the slowest applications, so for now we can keep apps that haven't yet upgraded to Spark 2 depending upon spark1 in the Bigtop stack. Do you have any other alternative?

        Show
        jonathak Jonathan Kelly added a comment - Konstantin Boudnik , I'm not a huge fan of having both Spark 1 and Spark 2 concurrently either, but the problem is that not all applications that depend upon Spark in some way have a quick enough release cycle to start supporting Spark 2. We shouldn't be held back by the slowest applications, so for now we can keep apps that haven't yet upgraded to Spark 2 depending upon spark1 in the Bigtop stack. Do you have any other alternative?
        Hide
        cos Konstantin Boudnik added a comment -

        Yes, the clean alternative was to introduce spark2 (as an experimental and unstable) and keep spark to be what it used to be two weeks ago. This way we'd avoid the disruption where a bunch of people had to run around and fix a bunch of silly bugs.

        And looking at the earlier comments here I am surprised that people who should know better have voted for the current approach.

        Show
        cos Konstantin Boudnik added a comment - Yes, the clean alternative was to introduce spark2 (as an experimental and unstable) and keep spark to be what it used to be two weeks ago. This way we'd avoid the disruption where a bunch of people had to run around and fix a bunch of silly bugs. And looking at the earlier comments here I am surprised that people who should know better have voted for the current approach.

          People

          • Assignee:
            jonathak Jonathan Kelly
            Reporter:
            oflebbe Olaf Flebbe
          • Votes:
            0 Vote for this issue
            Watchers:
            6 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development