Details

    • Type: Task
    • Status: Resolved
    • Priority: Blocker
    • Resolution: Fixed
    • Affects Version/s: 0.8.0
    • Fix Version/s: 1.0.0
    • Component/s: general
    • Labels:
    • Flags:
      Important

      Description

      Need to upgrade Mahout to the latest 0.10 release (first Hadoop 2.x compatible release)

      1. BIGTOP-1831.1.patch
        5 kB
        YoungWoo Kim
      2. BIGTOP-1831.2.patch
        4 kB
        YoungWoo Kim

        Issue Links

          Activity

          Hide
          andrew.musselman Andrew Musselman added a comment -

          Thank you too!

          Show
          andrew.musselman Andrew Musselman added a comment - Thank you too!
          Hide
          cos Konstantin Boudnik added a comment -

          Make sense. Thank you!

          Show
          cos Konstantin Boudnik added a comment - Make sense. Thank you!
          Hide
          andrew.musselman Andrew Musselman added a comment -

          We are fixing this fat build in either a point release or in the next major one; was a game time decision to leave it as is after a major refactor.

          Show
          andrew.musselman Andrew Musselman added a comment - We are fixing this fat build in either a point release or in the next major one; was a game time decision to leave it as is after a major refactor.
          Hide
          warwithin YoungWoo Kim added a comment -

          Committed.

          Show
          warwithin YoungWoo Kim added a comment - Committed.
          Hide
          cos Konstantin Boudnik added a comment -

          Looks good +1
          Please commit at your leisure. I think I will cut-off the branch tomorrow.

          Show
          cos Konstantin Boudnik added a comment - Looks good +1 Please commit at your leisure. I think I will cut-off the branch tomorrow.
          Hide
          warwithin YoungWoo Kim added a comment -

          Updated patch, BIGTOP-1831.2.patch

          Konstantin Boudnik, Thanks for your comment. To unblock the 1.0 release, I just removed the bits related SCALA_VERSION from Mahout. For now, build works without upgrading Scala. I'll file a another JIRA for Bumping up Scala version later.

          Show
          warwithin YoungWoo Kim added a comment - Updated patch, BIGTOP-1831.2.patch Konstantin Boudnik , Thanks for your comment. To unblock the 1.0 release, I just removed the bits related SCALA_VERSION from Mahout. For now, build works without upgrading Scala. I'll file a another JIRA for Bumping up Scala version later.
          Hide
          smarthi Suneel Marthi added a comment -

          We are aware of the > 200MB size of Mahout 0.10.0 package and this will definitely be addressed in the next Mahout release to keep it < 200MB. We'll have a Jira on Mahout to address this as a high priority. Thanks again for the feedback.

          Show
          smarthi Suneel Marthi added a comment - We are aware of the > 200MB size of Mahout 0.10.0 package and this will definitely be addressed in the next Mahout release to keep it < 200MB. We'll have a Jira on Mahout to address this as a high priority. Thanks again for the feedback.
          Hide
          cos Konstantin Boudnik added a comment -

          I haven't tested the component itself, but the build clearly works. With this in mind I am +1 on the patch, so I can cut out 1.0 RC branch and unblock the master progress. If we find any issues with Mahout 0.10 such as tests or else - we hopefully can fix it in consequent RC stabilization JIRAs. How does it look to everyone?

          Show
          cos Konstantin Boudnik added a comment - I haven't tested the component itself, but the build clearly works. With this in mind I am +1 on the patch, so I can cut out 1.0 RC branch and unblock the master progress. If we find any issues with Mahout 0.10 such as tests or else - we hopefully can fix it in consequent RC stabilization JIRAs. How does it look to everyone?
          Hide
          cos Konstantin Boudnik added a comment - - edited

          Ok, it seems to be an intermittent build artifacts that are getting that big. At any rate, new Mahout package is about 200MB. As far as I can see the main reason is that the package is piling up every single dependency declared in Mahout build. I see protobuf, common-logging, guava, servlet-api, xpp, xstream, etc., etc. Most of these dependencies would exist in Hadoop or Hbase or elsewhere. Spark build used to have the same problem until I fixed their Maven assembly, to avoid redistributing everything. My special grudge is with easymock - it doesn't seem to belong in the product package.

          I would suggest we use current build as is just to unblock the Bigtop release and fix it later. Unless someone wants to give it a spin and add the logic into package creation script to remove redundant dependencies and use/link to the ones from other packages.

          Andrew Musselman, David Standish - is it possible to address this issue in the consequent release of Mahout, so we can trim the package to some reasonable size?

          Show
          cos Konstantin Boudnik added a comment - - edited Ok, it seems to be an intermittent build artifacts that are getting that big. At any rate, new Mahout package is about 200MB. As far as I can see the main reason is that the package is piling up every single dependency declared in Mahout build. I see protobuf, common-logging, guava, servlet-api, xpp, xstream, etc., etc. Most of these dependencies would exist in Hadoop or Hbase or elsewhere. Spark build used to have the same problem until I fixed their Maven assembly, to avoid redistributing everything. My special grudge is with easymock - it doesn't seem to belong in the product package. I would suggest we use current build as is just to unblock the Bigtop release and fix it later. Unless someone wants to give it a spin and add the logic into package creation script to remove redundant dependencies and use/link to the ones from other packages. Andrew Musselman , David Standish - is it possible to address this issue in the consequent release of Mahout, so we can trim the package to some reasonable size?
          Hide
          cos Konstantin Boudnik added a comment -

          Is this correct that Mahout along is getting up to almost 2GB?? Doesn't sound right... I can not even complete the build because my container is running out of space...

          Show
          cos Konstantin Boudnik added a comment - Is this correct that Mahout along is getting up to almost 2GB ?? Doesn't sound right... I can not even complete the build because my container is running out of space...
          Hide
          cos Konstantin Boudnik added a comment -

          Hey YoungWoo Kim, I am looking at the patch and see that you have bumped up Scala to 2.10.4 as a part of it.
          If this is a requirement for Mahout 0.10 - let's make this more articulate and do the bump as a separate patch, so in the future ppl won't have to guess the meaning of the change.

          Also, knowing how great Scala's backward compatibility between even the minor releases I am sorta on the 'be very careful' side when it gets to such version bumps. Are we certain that Spark won't be affected by the upgrade?

          BTW, I was able to build Mahout-0.10 on Ubuntu 14.04 using your patch but with Scale 2.10.3. So, unless I am missing something perhaps it'd make sense to hold off the upgrade of the Scala. Thoughts?

          Show
          cos Konstantin Boudnik added a comment - Hey YoungWoo Kim , I am looking at the patch and see that you have bumped up Scala to 2.10.4 as a part of it. If this is a requirement for Mahout 0.10 - let's make this more articulate and do the bump as a separate patch, so in the future ppl won't have to guess the meaning of the change. Also, knowing how great Scala's backward compatibility between even the minor releases I am sorta on the 'be very careful' side when it gets to such version bumps. Are we certain that Spark won't be affected by the upgrade? BTW, I was able to build Mahout-0.10 on Ubuntu 14.04 using your patch but with Scale 2.10.3 . So, unless I am missing something perhaps it'd make sense to hold off the upgrade of the Scala. Thoughts?
          Hide
          warwithin YoungWoo Kim added a comment -

          BIGTOP-1831.1.patch

          • A WIP patch for Mahout 0.10.0

          Sorry for the late. I attached a WIP patch, please review it.

          Show
          warwithin YoungWoo Kim added a comment - BIGTOP-1831.1.patch A WIP patch for Mahout 0.10.0 Sorry for the late. I attached a WIP patch, please review it.
          Hide
          cos Konstantin Boudnik added a comment -

          How is it going guys? Looks like this is one of the blockers for 1.0 as we can not use old 0.9 version. Appreciate the help! Thank you!

          Show
          cos Konstantin Boudnik added a comment - How is it going guys? Looks like this is one of the blockers for 1.0 as we can not use old 0.9 version. Appreciate the help! Thank you!
          Hide
          warwithin YoungWoo Kim added a comment -

          Andrew Musselman, OK. so far building packages works fine for my end with Spark 1.1.x. Let me upload a WIP patch for this if smoke tests pass. Thanks!

          Show
          warwithin YoungWoo Kim added a comment - Andrew Musselman , OK. so far building packages works fine for my end with Spark 1.1.x. Let me upload a WIP patch for this if smoke tests pass. Thanks!
          Hide
          andrew.musselman Andrew Musselman added a comment -

          We tabled Spark 1.3 for the 0.10 release; 1.3 is targeted for our 0.11 or 1.0 release mid-year.

          jay vyas figured we could ship with instructions to build for 1.1 or 1.2 in the meantime; that work?

          Show
          andrew.musselman Andrew Musselman added a comment - We tabled Spark 1.3 for the 0.10 release; 1.3 is targeted for our 0.11 or 1.0 release mid-year. jay vyas figured we could ship with instructions to build for 1.1 or 1.2 in the meantime; that work?
          Hide
          warwithin YoungWoo Kim added a comment -

          David Standish, Andrew Musselman, Thanks for the update.

          While I'm trying to build Mahout with Spark 1.3.0, I run into this:

          [INFO] ------------------------------------------------------------------------
          [INFO] Building Mahout Spark bindings 0.10.0
          [INFO] ------------------------------------------------------------------------
          [INFO] 
          [INFO] --- maven-clean-plugin:2.5:clean (default-clean) @ mahout-spark_2.10 ---
          [INFO] 
          [INFO] --- scala-maven-plugin:3.2.0:add-source (add-scala-sources) @ mahout-spark_2.10 ---
          [INFO] Add Source directory: /home/ywkim/workspace/bigtop/build/mahout/rpm/BUILD/mahout-distribution-0.10.0/spark/src/main/scala
          [INFO] Add Test Source directory: /home/ywkim/workspace/bigtop/build/mahout/rpm/BUILD/mahout-distribution-0.10.0/spark/src/test/scala
          [INFO] 
          [INFO] --- maven-remote-resources-plugin:1.5:process (default) @ mahout-spark_2.10 ---
          [INFO] 
          [INFO] --- maven-resources-plugin:2.6:resources (default-resources) @ mahout-spark_2.10 ---
          [INFO] Using 'UTF-8' encoding to copy filtered resources.
          [INFO] skip non existing resourceDirectory /home/ywkim/workspace/bigtop/build/mahout/rpm/BUILD/mahout-distribution-0.10.0/spark/src/main/resources
          [INFO] Copying 3 resources
          [INFO] 
          [INFO] --- scala-maven-plugin:3.2.0:compile (scala-compile) @ mahout-spark_2.10 ---
          [INFO] /home/ywkim/workspace/bigtop/build/mahout/rpm/BUILD/mahout-distribution-0.10.0/spark/src/main/scala:-1: info: compiling
          [INFO] Compiling 37 source files to /home/ywkim/workspace/bigtop/build/mahout/rpm/BUILD/mahout-distribution-0.10.0/spark/target/classes at 1429618848607
          [WARNING] /home/ywkim/workspace/bigtop/build/mahout/rpm/BUILD/mahout-distribution-0.10.0/spark/src/main/scala/org/apache/mahout/drivers/TextDelimitedReaderWriter.scala:163: warning: a pure expression does nothing in statement position; you may be omitting necessary parentheses
          [WARNING]       val columnIDs = interactions.flatMap { case (_, columns) => columns
          [WARNING]                                                                   ^
          [ERROR] /home/ywkim/workspace/bigtop/build/mahout/rpm/BUILD/mahout-distribution-0.10.0/spark/src/main/scala/org/apache/mahout/sparkbindings/drm/CheckpointedDrmSpark.scala:168: error: value saveAsSequenceFile is not a member of org.apache.mahout.sparkbindings.DrmRdd[K]
          [ERROR]     rdd.saveAsSequenceFile(path)
          [ERROR]         ^
          [ERROR] /home/ywkim/workspace/bigtop/build/mahout/rpm/BUILD/mahout-distribution-0.10.0/spark/src/main/scala/org/apache/mahout/sparkbindings/drm/package.scala:26: error: object FilteredRDD is not a member of package org.apache.spark.rdd
          [ERROR] import org.apache.spark.rdd.{FilteredRDD, RDD}
          [ERROR]        ^
          [WARNING] one warning found
          [ERROR] two errors found
          
          

          It seems that Mahout need a little fix to work with Spark 1.3.x. I verified packaging works with Spark 1.1.x though. Spark 1.3.0 have been on board for Bigtop.

          $ cd $BIGTOP_HOME
          $ ./gradlew bom-json
          ...
                  {
                      "name": {
                          "project": "spark",
                          "pkg": "spark-core",
                          "relNotes": "Spark"
                      },
                      "tarball": {
                          "destination": "spark-1.3.0.tar.gz",
                          "source": "spark-1.3.0.tgz"
                      },
                      "url": {
                          "site": "http://apache.osuosl.org/spark/spark-1.3.0",
                          "archive": "http://archive.apache.org/dist/spark/spark-1.3.0"
                      },
                      "version": {
                          "base": "1.3.0",
                          "pkg": "1.3.0",
                          "release": "1"
                      },
                      "git": {
                          "repo": null,
                          "ref": null,
                          "dir": null
                      }
                  },
          ...
          
          Show
          warwithin YoungWoo Kim added a comment - David Standish , Andrew Musselman , Thanks for the update. While I'm trying to build Mahout with Spark 1.3.0, I run into this: [INFO] ------------------------------------------------------------------------ [INFO] Building Mahout Spark bindings 0.10.0 [INFO] ------------------------------------------------------------------------ [INFO] [INFO] --- maven-clean-plugin:2.5:clean (default-clean) @ mahout-spark_2.10 --- [INFO] [INFO] --- scala-maven-plugin:3.2.0:add-source (add-scala-sources) @ mahout-spark_2.10 --- [INFO] Add Source directory: /home/ywkim/workspace/bigtop/build/mahout/rpm/BUILD/mahout-distribution-0.10.0/spark/src/main/scala [INFO] Add Test Source directory: /home/ywkim/workspace/bigtop/build/mahout/rpm/BUILD/mahout-distribution-0.10.0/spark/src/test/scala [INFO] [INFO] --- maven-remote-resources-plugin:1.5:process (default) @ mahout-spark_2.10 --- [INFO] [INFO] --- maven-resources-plugin:2.6:resources (default-resources) @ mahout-spark_2.10 --- [INFO] Using 'UTF-8' encoding to copy filtered resources. [INFO] skip non existing resourceDirectory /home/ywkim/workspace/bigtop/build/mahout/rpm/BUILD/mahout-distribution-0.10.0/spark/src/main/resources [INFO] Copying 3 resources [INFO] [INFO] --- scala-maven-plugin:3.2.0:compile (scala-compile) @ mahout-spark_2.10 --- [INFO] /home/ywkim/workspace/bigtop/build/mahout/rpm/BUILD/mahout-distribution-0.10.0/spark/src/main/scala:-1: info: compiling [INFO] Compiling 37 source files to /home/ywkim/workspace/bigtop/build/mahout/rpm/BUILD/mahout-distribution-0.10.0/spark/target/classes at 1429618848607 [WARNING] /home/ywkim/workspace/bigtop/build/mahout/rpm/BUILD/mahout-distribution-0.10.0/spark/src/main/scala/org/apache/mahout/drivers/TextDelimitedReaderWriter.scala:163: warning: a pure expression does nothing in statement position; you may be omitting necessary parentheses [WARNING] val columnIDs = interactions.flatMap { case (_, columns) => columns [WARNING] ^ [ERROR] /home/ywkim/workspace/bigtop/build/mahout/rpm/BUILD/mahout-distribution-0.10.0/spark/src/main/scala/org/apache/mahout/sparkbindings/drm/CheckpointedDrmSpark.scala:168: error: value saveAsSequenceFile is not a member of org.apache.mahout.sparkbindings.DrmRdd[K] [ERROR] rdd.saveAsSequenceFile(path) [ERROR] ^ [ERROR] /home/ywkim/workspace/bigtop/build/mahout/rpm/BUILD/mahout-distribution-0.10.0/spark/src/main/scala/org/apache/mahout/sparkbindings/drm/package.scala:26: error: object FilteredRDD is not a member of package org.apache.spark.rdd [ERROR] import org.apache.spark.rdd.{FilteredRDD, RDD} [ERROR] ^ [WARNING] one warning found [ERROR] two errors found It seems that Mahout need a little fix to work with Spark 1.3.x. I verified packaging works with Spark 1.1.x though. Spark 1.3.0 have been on board for Bigtop. $ cd $BIGTOP_HOME $ ./gradlew bom-json ... { "name": { "project": "spark", "pkg": "spark-core", "relNotes": "Spark" }, "tarball": { "destination": "spark-1.3.0.tar.gz", "source": "spark-1.3.0.tgz" }, "url": { "site": "http://apache.osuosl.org/spark/spark-1.3.0", "archive": "http://archive.apache.org/dist/spark/spark-1.3.0" }, "version": { "base": "1.3.0", "pkg": "1.3.0", "release": "1" }, "git": { "repo": null, "ref": null, "dir": null } }, ...
          Hide
          andrew.musselman Andrew Musselman added a comment -

          Great, let us know how we can help. I've asked for help on a build issue on user@b.a.o.

          Show
          andrew.musselman Andrew Musselman added a comment - Great, let us know how we can help. I've asked for help on a build issue on user@b.a.o.
          Hide
          evans_ye Evans Ye added a comment -

          Hey David Starina thanks for the JIRA. Are you planning to submit the patch?
          If so, we can get your account on board to dev list and assign this JIRA to you.

          Show
          evans_ye Evans Ye added a comment - Hey David Starina thanks for the JIRA. Are you planning to submit the patch? If so, we can get your account on board to dev list and assign this JIRA to you.
          Hide
          dstarina David Starina added a comment -

          Hadoop 2 compatible release

          Show
          dstarina David Starina added a comment - Hadoop 2 compatible release

            People

            • Assignee:
              warwithin YoungWoo Kim
              Reporter:
              dstarina David Starina
            • Votes:
              0 Vote for this issue
              Watchers:
              7 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development