Details

    • Type: Improvement
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: 1.0.0
    • Fix Version/s: 1.1.0
    • Component/s: spark
    • Labels:
      None

      Description

      Bump up Apache Spark version to 1.5.1

        Issue Links

        There are no Sub-Tasks for this issue.

          Activity

          Hide
          jonathak Jonathan Kelly added a comment -

          I have already done this and can submit a patch, but not until next week at the earliest. I made a lot of changes to the packaging and puppet scripts, so it's not just a simple upgrade this time.

          Show
          jonathak Jonathan Kelly added a comment - I have already done this and can submit a patch, but not until next week at the earliest. I made a lot of changes to the packaging and puppet scripts, so it's not just a simple upgrade this time.
          Hide
          warwithin YoungWoo Kim added a comment -

          Jonathan, Good to know that! No need to hurry at this point. feel free to ping when you're ready. Thanks!

          Show
          warwithin YoungWoo Kim added a comment - Jonathan, Good to know that! No need to hurry at this point. feel free to ping when you're ready. Thanks!
          Hide
          jayunit100 jay vyas added a comment -

          looking forward to it Jonathan Kelly

          Show
          jayunit100 jay vyas added a comment - looking forward to it Jonathan Kelly
          Hide
          jonathak Jonathan Kelly added a comment -

          Really sorry I haven't provided this yet, but I realized that it's difficult for me to provide my patch for this until we can get our large patches for BIGTOP-1746 and BIGTOP-1689 merged in as well.

          Show
          jonathak Jonathan Kelly added a comment - Really sorry I haven't provided this yet, but I realized that it's difficult for me to provide my patch for this until we can get our large patches for BIGTOP-1746 and BIGTOP-1689 merged in as well.
          Hide
          jayunit100 jay vyas added a comment - - edited

          Ok. So sounds like there is a big backlog you guys have here.

          Whose the point man on these 3 patches?

          If they are blocking then Let's get them merged !

          Show
          jayunit100 jay vyas added a comment - - edited Ok. So sounds like there is a big backlog you guys have here. Whose the point man on these 3 patches? If they are blocking then Let's get them merged !
          Hide
          jonathak Jonathan Kelly added a comment - - edited

          Yeah, definitely want to get them merged. BIGTOP-1746 has finally gotten some traction again recently, and I believe vishnu gajendran has been working on cleaning it up, but BIGTOP-1689 has been cold for a long time. Peter Slawski and/or Rob Leidle would be the point men on that one.

          Show
          jonathak Jonathan Kelly added a comment - - edited Yeah, definitely want to get them merged. BIGTOP-1746 has finally gotten some traction again recently, and I believe vishnu gajendran has been working on cleaning it up, but BIGTOP-1689 has been cold for a long time. Peter Slawski and/or Rob Leidle would be the point men on that one.
          Hide
          tomzeng Tom Zeng added a comment -

          FYI - should be 1.4.1, Jonathan had that working already

          Show
          tomzeng Tom Zeng added a comment - FYI - should be 1.4.1, Jonathan had that working already
          Hide
          apurtell Andrew Purtell added a comment -

          Spark just announced a 1.5 release: http://spark.apache.org/news/spark-1-5-0-released.html . Should we look at moving up to that instead of 1.4?

          Show
          apurtell Andrew Purtell added a comment - Spark just announced a 1.5 release: http://spark.apache.org/news/spark-1-5-0-released.html . Should we look at moving up to that instead of 1.4?
          Hide
          jonathak Jonathan Kelly added a comment -

          Yes, when we can finally contribute this patch back (see above comment about needing to contribute back the "introduce roles" and "config merging" patches first), it will actually be Spark 1.5.0, so we could rename this issue.

          Show
          jonathak Jonathan Kelly added a comment - Yes, when we can finally contribute this patch back (see above comment about needing to contribute back the "introduce roles" and "config merging" patches first), it will actually be Spark 1.5.0, so we could rename this issue.
          Hide
          tomzeng Tom Zeng added a comment -

          Jonathan Kelly is there a JIRA for the config merging patch?

          Show
          tomzeng Tom Zeng added a comment - Jonathan Kelly is there a JIRA for the config merging patch?
          Hide
          apurtell Andrew Purtell added a comment -

          Ok, I renamed the issue.

          Show
          apurtell Andrew Purtell added a comment - Ok, I renamed the issue.
          Hide
          evans_ye Evans Ye added a comment -

          I'm working with vishnu gajendran to get BIGTOP-1746 in. I guess we're very close to the finish line...

          Show
          evans_ye Evans Ye added a comment - I'm working with vishnu gajendran to get BIGTOP-1746 in. I guess we're very close to the finish line...
          Hide
          jonathak Jonathan Kelly added a comment -

          Tom Zeng, yes, both are linked above. The config merging one is BIGTOP-1689, which Rob Leidle is planning on contributing back once he has time. (He has made several changes beyond Peter Slawski's original patch.)

          Show
          jonathak Jonathan Kelly added a comment - Tom Zeng , yes, both are linked above. The config merging one is BIGTOP-1689 , which Rob Leidle is planning on contributing back once he has time. (He has made several changes beyond Peter Slawski 's original patch.)
          Hide
          rleidle Rob Leidle added a comment -

          Yes, I have been preparing a patch for this.

          Show
          rleidle Rob Leidle added a comment - Yes, I have been preparing a patch for this.
          Hide
          tomzeng Tom Zeng added a comment -

          Ok great, once this is and the config merge are in, it'd be a lot easier for us to contribute additional components

          Show
          tomzeng Tom Zeng added a comment - Ok great, once this is and the config merge are in, it'd be a lot easier for us to contribute additional components
          Hide
          cos Konstantin Boudnik added a comment -

          Looks like the roles are in. Shall this be moving forward as well?

          Show
          cos Konstantin Boudnik added a comment - Looks like the roles are in. Shall this be moving forward as well?
          Hide
          rnowling RJ Nowling added a comment -

          Quick question: why does upgrading Spark depend on changes to the Puppet deployment? Can we upgrade Spark and handle BIGTOP-1689 later?

          Show
          rnowling RJ Nowling added a comment - Quick question: why does upgrading Spark depend on changes to the Puppet deployment? Can we upgrade Spark and handle BIGTOP-1689 later?
          Hide
          githubbot ASF GitHub Bot added a comment -

          GitHub user ejono opened a pull request:

          https://github.com/apache/bigtop/pull/43

          BIGTOP-1944. Upgrade Spark version to 1.5.1

          This pull request also contains some improvements to the RPM packaging and Puppet deployment.

          You can merge this pull request into a Git repository by running:

          $ git pull https://github.com/ejono/bigtop BIGTOP-1944

          Alternatively you can review and apply these changes as the patch at:

          https://github.com/apache/bigtop/pull/43.patch

          To close this pull request, make a commit to your master/trunk branch
          with (at least) the following in the commit message:

          This closes #43


          commit f60abc75d1ca9daf17d217b7ec2d4464c7fe47fc
          Author: Jonathan Kelly <jonathak@amazon.com>
          Date: 2015-09-25T15:26:26Z

          BIGTOP-1944. Upgrade Spark version to 1.5.1


          Show
          githubbot ASF GitHub Bot added a comment - GitHub user ejono opened a pull request: https://github.com/apache/bigtop/pull/43 BIGTOP-1944 . Upgrade Spark version to 1.5.1 This pull request also contains some improvements to the RPM packaging and Puppet deployment. You can merge this pull request into a Git repository by running: $ git pull https://github.com/ejono/bigtop BIGTOP-1944 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/bigtop/pull/43.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #43 commit f60abc75d1ca9daf17d217b7ec2d4464c7fe47fc Author: Jonathan Kelly <jonathak@amazon.com> Date: 2015-09-25T15:26:26Z BIGTOP-1944 . Upgrade Spark version to 1.5.1
          Hide
          jonathak Jonathan Kelly added a comment -

          Ideally BIGTOP-1792 should be merged before this one, since there are some conflicting changes in cluster.yaml, but it's not absolutely necessary to merge BIGTOP-1792 first.

          Show
          jonathak Jonathan Kelly added a comment - Ideally BIGTOP-1792 should be merged before this one, since there are some conflicting changes in cluster.yaml, but it's not absolutely necessary to merge BIGTOP-1792 first.
          Hide
          jonathak Jonathan Kelly added a comment -

          The pull request I just created includes all Spark-related changes I've made in our EMR Bigtop fork, minus the not-yet-merged "config file merging" changes from BIGTOP-1689 and a few other things that are currently EMR-specific. Sorry again for the delay, but we've been a bit busy lately. We hope to be able to contribute back more patches like this from now on.

          Show
          jonathak Jonathan Kelly added a comment - The pull request I just created includes all Spark-related changes I've made in our EMR Bigtop fork, minus the not-yet-merged "config file merging" changes from BIGTOP-1689 and a few other things that are currently EMR-specific. Sorry again for the delay, but we've been a bit busy lately. We hope to be able to contribute back more patches like this from now on.
          Hide
          rnowling RJ Nowling added a comment - - edited

          Hi Jonathan Kelly,

          Thanks for the patch! A few comments:

          1. I can't review the Puppet bits because I don't have enough familiarity with them. If you want to move the Puppet changes to a separate patch / JIRA, I can focus on reviewing the Spark update and RPM changes

          2. I'm worried about compiling Spark with Ganglia, which is LPGL licensed. Spark seems to need this to be explicitly enabled, and I'm wondering if we need to do the same as well. (e.g., create an environmental variable or something and document it). I'll send an email to the dev list to get feedback on this.

          3. The patch has a number of changes that seem purely cosmetic (e.g., changing the order of the source imports in the RPM SPEC file, re-organized lines in spark-core.install, etc.). These make the patch hard to read, and I don't want to accidentally introduce errors. Can you clean those up?

          Thanks!

          Show
          rnowling RJ Nowling added a comment - - edited Hi Jonathan Kelly , Thanks for the patch! A few comments: 1. I can't review the Puppet bits because I don't have enough familiarity with them. If you want to move the Puppet changes to a separate patch / JIRA, I can focus on reviewing the Spark update and RPM changes 2. I'm worried about compiling Spark with Ganglia, which is LPGL licensed. Spark seems to need this to be explicitly enabled, and I'm wondering if we need to do the same as well. (e.g., create an environmental variable or something and document it). I'll send an email to the dev list to get feedback on this. 3. The patch has a number of changes that seem purely cosmetic (e.g., changing the order of the source imports in the RPM SPEC file, re-organized lines in spark-core.install, etc.). These make the patch hard to read, and I don't want to accidentally introduce errors. Can you clean those up? Thanks!
          Hide
          jonathak Jonathan Kelly added a comment - - edited

          1. OK, I could move the Puppet changes to a separate pull request if you want, unless somebody else with more familiarity with Puppet can review them.

          2. Yeah, sorry, I should probably not have included the option to compile Spark with Ganglia. For EMR we are compiling by default with Ganglia and the Kinesis connector (with -Pkinesis-asl), but I removed the latter from this patch because it doesn't make sense to default Apache Bigtop to building Spark with Kinesis support. I suppose the same is true of Ganglia support, especially because it's LGPL. If I remove it, there actually already is an environment variable that you can use to enable it--just set SPARK_BUILD_OPTS=-Pspark-ganglia-lgpl before running the build.

          3. Sure, sorry about that. What if I split out cosmetic changes like that into a separate commit in the same pull request? Or would you want them in a separate pull request or not to be made at all?

          Show
          jonathak Jonathan Kelly added a comment - - edited 1. OK, I could move the Puppet changes to a separate pull request if you want, unless somebody else with more familiarity with Puppet can review them. 2. Yeah, sorry, I should probably not have included the option to compile Spark with Ganglia. For EMR we are compiling by default with Ganglia and the Kinesis connector (with -Pkinesis-asl), but I removed the latter from this patch because it doesn't make sense to default Apache Bigtop to building Spark with Kinesis support. I suppose the same is true of Ganglia support, especially because it's LGPL. If I remove it, there actually already is an environment variable that you can use to enable it--just set SPARK_BUILD_OPTS=-Pspark-ganglia-lgpl before running the build. 3. Sure, sorry about that. What if I split out cosmetic changes like that into a separate commit in the same pull request? Or would you want them in a separate pull request or not to be made at all?
          Hide
          rnowling RJ Nowling added a comment - - edited

          1. I'm happy to gives other folks time to chime in on the Puppet changes while I review the Spark updates.

          2. Yes, let's remove Ganglia – that's the easiest solution. It might be worth documenting how to add Ganglia support to the packages for users who want it and are willing to build their own packages.

          3. Looking at the cosmetic changes again, I'm fine with things like moving the spark-env.sh into its own file. I'll double-check the changes in the SPEC and spark-core.install so no need to worry about those for now.

          Show
          rnowling RJ Nowling added a comment - - edited 1. I'm happy to gives other folks time to chime in on the Puppet changes while I review the Spark updates. 2. Yes, let's remove Ganglia – that's the easiest solution. It might be worth documenting how to add Ganglia support to the packages for users who want it and are willing to build their own packages. 3. Looking at the cosmetic changes again, I'm fine with things like moving the spark-env.sh into its own file. I'll double-check the changes in the SPEC and spark-core.install so no need to worry about those for now.
          Hide
          jonathak Jonathan Kelly added a comment - - edited

          OK, I've split it into three commits: the packaging changes, the Puppet changes, and the cosmetic changes. And I've removed the -Pspark-ganglia-lgpl argument from the build. Where is the appropriate place for documenting how to enable it?

          Show
          jonathak Jonathan Kelly added a comment - - edited OK, I've split it into three commits: the packaging changes, the Puppet changes, and the cosmetic changes. And I've removed the -Pspark-ganglia-lgpl argument from the build. Where is the appropriate place for documenting how to enable it?
          Hide
          rnowling RJ Nowling added a comment -

          You rock! Thank you!

          Not sure where best to document that right now.

          Show
          rnowling RJ Nowling added a comment - You rock! Thank you! Not sure where best to document that right now.
          Hide
          cos Konstantin Boudnik added a comment -

          Thanks for getting read of the automatic inclusion of LGPL licensed code.
          As for documenting it, I'd put it into the top-level BUILDING.txt

          Show
          cos Konstantin Boudnik added a comment - Thanks for getting read of the automatic inclusion of LGPL licensed code. As for documenting it, I'd put it into the top-level BUILDING.txt
          Hide
          rnowling RJ Nowling added a comment -

          The patch also adds SparkR, which means including R for the build. That adds something like 150+ packages to the build (at least for CentOS). I'm gonna send an email around about that, too, since that's a big change.

          Show
          rnowling RJ Nowling added a comment - The patch also adds SparkR, which means including R for the build. That adds something like 150+ packages to the build (at least for CentOS). I'm gonna send an email around about that, too, since that's a big change.
          Hide
          jonathak Jonathan Kelly added a comment -

          I have updated the pull request to split out adding support for SparkR into a separate commit. We can split it off to a brand new JIRA if you want too.

          Show
          jonathak Jonathan Kelly added a comment - I have updated the pull request to split out adding support for SparkR into a separate commit. We can split it off to a brand new JIRA if you want too.
          Hide
          rnowling RJ Nowling added a comment -

          Thanks!

          I'd like to separate the four patches into 3 JIRAs: this one, updating bigtop-deploy, and adding SparkR. Please create a separate JIRA for each patch and attach the patch or submit a separate PR for each JIRA. One JIRA per patch is the standard community approach.

          The SparkR patch doesn't update the bigtop_toolchain – we need that in the same patch so that the CI builds will still work once we add SparkR and so developers have what they need to do builds on local machines.

          Please divide the cosmetic changes up into the three patches as appropriate.

          Show
          rnowling RJ Nowling added a comment - Thanks! I'd like to separate the four patches into 3 JIRAs: this one, updating bigtop-deploy, and adding SparkR. Please create a separate JIRA for each patch and attach the patch or submit a separate PR for each JIRA. One JIRA per patch is the standard community approach. The SparkR patch doesn't update the bigtop_toolchain – we need that in the same patch so that the CI builds will still work once we add SparkR and so developers have what they need to do builds on local machines. Please divide the cosmetic changes up into the three patches as appropriate.
          Hide
          cos Konstantin Boudnik added a comment -

          Would be good to have them all combined under an umbrella JIRA, so it is easier to track them down. Thanks!

          Show
          cos Konstantin Boudnik added a comment - Would be good to have them all combined under an umbrella JIRA, so it is easier to track them down. Thanks!
          Hide
          rnowling RJ Nowling added a comment -

          Yes, we could add the JIRAs as sub-JIRAs of this JIRA for example.

          Show
          rnowling RJ Nowling added a comment - Yes, we could add the JIRAs as sub-JIRAs of this JIRA for example.
          Hide
          jonathak Jonathan Kelly added a comment -

          BTW, I was just trying out the Spark build on Ubuntu and got the following:

          dpkg-gencontrol: error: illegal package name 'spark-R': character 'R' not allowed

          Does this mean capital letters are not allowed in DEB package names? If so, I suppose spark-r is OK, but I should probably rename it for both RPM and DEB so that I don't need special cases in the Puppet module.

          Show
          jonathak Jonathan Kelly added a comment - BTW, I was just trying out the Spark build on Ubuntu and got the following: dpkg-gencontrol: error: illegal package name 'spark-R': character 'R' not allowed Does this mean capital letters are not allowed in DEB package names? If so, I suppose spark-r is OK, but I should probably rename it for both RPM and DEB so that I don't need special cases in the Puppet module.
          Hide
          cos Konstantin Boudnik added a comment -

          Is Hive server really is a requirement for Spark 1.5.1? It doesn't look right, really. Can anyone please chime in on this?

          Show
          cos Konstantin Boudnik added a comment - Is Hive server really is a requirement for Spark 1.5.1? It doesn't look right, really. Can anyone please chime in on this?
          Hide
          rnowling RJ Nowling added a comment -

          Spark has the option to run against a Hive server but I don't think it's required. (Or at least, I don't do anything with Hive in Spark RPMs I build for my team inside Red Hat.)

          I don't think Hive should be blocking Spark 1.5.

          Show
          rnowling RJ Nowling added a comment - Spark has the option to run against a Hive server but I don't think it's required. (Or at least, I don't do anything with Hive in Spark RPMs I build for my team inside Red Hat.) I don't think Hive should be blocking Spark 1.5.
          Hide
          cos Konstantin Boudnik added a comment -

          Yup, that's what I am saying. This is a deployment property and upgrade shouldn't be hanging on it. Thanks for the confirmation.

          Show
          cos Konstantin Boudnik added a comment - Yup, that's what I am saying. This is a deployment property and upgrade shouldn't be hanging on it. Thanks for the confirmation.
          Hide
          rnowling RJ Nowling added a comment -

          There may be a compiler flag but we could add that after the upgrade as another issue.

          Small, atomic changes really help with review and getting unstuck from huge dependency graphs...

          Show
          rnowling RJ Nowling added a comment - There may be a compiler flag but we could add that after the upgrade as another issue. Small, atomic changes really help with review and getting unstuck from huge dependency graphs...
          Hide
          cos Konstantin Boudnik added a comment -

          Hi Jonathan Kelly. Any updates on this? Please let us know if you're still working on this. Looks like this is becoming more and more critical as we can not move forward with Zeppelin. Would be good to have at least basic 1.5.1 integration, so we can move forward with the rest of it.

          Show
          cos Konstantin Boudnik added a comment - Hi Jonathan Kelly . Any updates on this? Please let us know if you're still working on this. Looks like this is becoming more and more critical as we can not move forward with Zeppelin. Would be good to have at least basic 1.5.1 integration, so we can move forward with the rest of it.
          Hide
          cos Konstantin Boudnik added a comment -

          Cancelling the PA status: there's no patch file available

          Show
          cos Konstantin Boudnik added a comment - Cancelling the PA status: there's no patch file available
          Hide
          cos Konstantin Boudnik added a comment -

          My bad, there's git hub PR...

          Show
          cos Konstantin Boudnik added a comment - My bad, there's git hub PR...
          Hide
          cos Konstantin Boudnik added a comment -

          I started looking into this PR and it doesn't rebase cleanly on the current master. The spark host name issue has been fixed in BIGTOP-2074, so the PR needs to be updated.

          <<<<<<< HEAD
          export SPARK_MASTER_IP=$STANDALONE_SPARK_MASTER_HOST
          =======
          export SPARK_MASTER_PORT=<%= @master_port %>
          >>>>>>> BIGTOP-1944. Spark Puppet module changes
          
          Show
          cos Konstantin Boudnik added a comment - I started looking into this PR and it doesn't rebase cleanly on the current master. The spark host name issue has been fixed in BIGTOP-2074 , so the PR needs to be updated. <<<<<<< HEAD export SPARK_MASTER_IP=$STANDALONE_SPARK_MASTER_HOST ======= export SPARK_MASTER_PORT=<%= @master_port %> >>>>>>> BIGTOP-1944. Spark Puppet module changes
          Hide
          cos Konstantin Boudnik added a comment -

          I have split this ticket into three (feel free to add the forth if you want the cosmetic changes to go in). Doing the testing of the packages and puppet changes now. If everything is all right - will commit it later today.

          Show
          cos Konstantin Boudnik added a comment - I have split this ticket into three (feel free to add the forth if you want the cosmetic changes to go in). Doing the testing of the packages and puppet changes now. If everything is all right - will commit it later today.
          Hide
          cos Konstantin Boudnik added a comment -

          Patches are moved to smaller subtasks

          Show
          cos Konstantin Boudnik added a comment - Patches are moved to smaller subtasks
          Hide
          rnowling RJ Nowling added a comment -

          Thanks, Konstantin Boudnik! I'm happy to see this move forward.

          Show
          rnowling RJ Nowling added a comment - Thanks, Konstantin Boudnik ! I'm happy to see this move forward.
          Hide
          cos Konstantin Boudnik added a comment -

          All subtasks are addressed. Closing as fixed for the release.

          Show
          cos Konstantin Boudnik added a comment - All subtasks are addressed. Closing as fixed for the release.

            People

            • Assignee:
              jonathak Jonathan Kelly
              Reporter:
              warwithin YoungWoo Kim
            • Votes:
              0 Vote for this issue
              Watchers:
              11 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development