Details

    • Type: Task
    • Status: Resolved
    • Priority: Critical
    • Resolution: Fixed
    • Affects Version/s: 0.13.0
    • Fix Version/s: 1.0.0, 0.13.0, 0.14.0
    • Component/s: spark
    • Labels:
      None

      Description

      add support for Spark 2.x as backend execution engine.

        Issue Links

          Activity

          Hide
          rawkintrevo Trevor Grant added a comment - - edited

          When building Mahout against spark 2.0.2 I get one warning
          ```
          [WARNING] /home/rawkintrevo/gits/mahout-local/spark/src/test/scala/org/apache/mahout/sparkbindings/drm/DrmLikeSuite.scala:127: warning: constructor SQLContext in class SQLContext is deprecated: Use SparkSession.builder instead
          [WARNING] val sqlContext= new org.apache.spark.sql.SQLContext(sc)
          ```

          And then in the tests a lot of whining about
          ```
          Exception encountered when invoking run on a nested suite - requirement failed: MAHOUT_HOME is required to spawn mahout-based spark jobs *** ABORTED ***
          java.lang.IllegalArgumentException: requirement failed: MAHOUT_HOME is required to spawn mahout-based spark jobs
          ```

          Hard to say for sure because I had to skip the tests re: above issue, but if you DskipTests, mahout compiles up until the shell. Taking the resulting jars to zeppelin and running some basic funtions checks like linear regression- it seems to work.

          Show
          rawkintrevo Trevor Grant added a comment - - edited When building Mahout against spark 2.0.2 I get one warning ``` [WARNING] /home/rawkintrevo/gits/mahout-local/spark/src/test/scala/org/apache/mahout/sparkbindings/drm/DrmLikeSuite.scala:127: warning: constructor SQLContext in class SQLContext is deprecated: Use SparkSession.builder instead [WARNING] val sqlContext= new org.apache.spark.sql.SQLContext(sc) ``` And then in the tests a lot of whining about ``` Exception encountered when invoking run on a nested suite - requirement failed: MAHOUT_HOME is required to spawn mahout-based spark jobs *** ABORTED *** java.lang.IllegalArgumentException: requirement failed: MAHOUT_HOME is required to spawn mahout-based spark jobs ``` Hard to say for sure because I had to skip the tests re: above issue, but if you DskipTests, mahout compiles up until the shell. Taking the resulting jars to zeppelin and running some basic funtions checks like linear regression- it seems to work.
          Hide
          rawkintrevo Trevor Grant added a comment -

          Spark 2.1.0 works the same.

          Warning re SQL Context, errors in shell, but compiled jars pass basic functions test (fails on no MAHOUT_HOME otherwise)

          Show
          rawkintrevo Trevor Grant added a comment - Spark 2.1.0 works the same. Warning re SQL Context, errors in shell, but compiled jars pass basic functions test (fails on no MAHOUT_HOME otherwise)
          Hide
          githubbot ASF GitHub Bot added a comment -

          GitHub user rawkintrevo opened a pull request:

          https://github.com/apache/mahout/pull/271

          MAHOUT-1894 Add Support for Spark 2.x

          As long as we're sticking to Scala 2.10, running mahout on spark 2.x is simply a matter of

          `mvn clean package -Dspark.version=2.0.2`
          or
          `mvn clean package -Dspark.version=2.1.0`

          The trouble comes with the shell...

          I checked Apache Zeppelin to see how they handle multiple spark/scala versions...
          [a brief preview of the descent into hell that is having a shell that handles multiple spark/scala versions](https://github.com/apache/zeppelin/blob/master/spark/src/main/java/org/apache/zeppelin/spark/SparkInterpreter.java)

          So I took an alternate root. I dropped the Mahout shell all together, changed the mahout bin file to load the spark shell directly, and pass a scala script that takes care of our imports.

          When building there is a single deprecation warning regarding the sqlContext and how it is created in the spark-bindings.

          I think we should add binaries for Spark 2.0 and Spark 2.1 as a matter of convenience and the Zeppelin integration.

          You can merge this pull request into a Git repository by running:

          $ git pull https://github.com/rawkintrevo/mahout mahout-1894

          Alternatively you can review and apply these changes as the patch at:

          https://github.com/apache/mahout/pull/271.patch

          To close this pull request, make a commit to your master/trunk branch
          with (at least) the following in the commit message:

          This closes #271


          commit 867cdd0c04d629eaf44a0e2031f447d03bf67bcc
          Author: rawkintrevo <trevor.d.grant@gmail.com>
          Date: 2017-02-02T06:18:21Z

          MAHOUT-1894 Add support for spark 2.x

          MAHOUT-1894 Add support for spark 2.x


          Show
          githubbot ASF GitHub Bot added a comment - GitHub user rawkintrevo opened a pull request: https://github.com/apache/mahout/pull/271 MAHOUT-1894 Add Support for Spark 2.x As long as we're sticking to Scala 2.10, running mahout on spark 2.x is simply a matter of `mvn clean package -Dspark.version=2.0.2` or `mvn clean package -Dspark.version=2.1.0` The trouble comes with the shell... I checked Apache Zeppelin to see how they handle multiple spark/scala versions... [a brief preview of the descent into hell that is having a shell that handles multiple spark/scala versions] ( https://github.com/apache/zeppelin/blob/master/spark/src/main/java/org/apache/zeppelin/spark/SparkInterpreter.java ) So I took an alternate root. I dropped the Mahout shell all together, changed the mahout bin file to load the spark shell directly, and pass a scala script that takes care of our imports. When building there is a single deprecation warning regarding the sqlContext and how it is created in the spark-bindings. I think we should add binaries for Spark 2.0 and Spark 2.1 as a matter of convenience and the Zeppelin integration. You can merge this pull request into a Git repository by running: $ git pull https://github.com/rawkintrevo/mahout mahout-1894 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/mahout/pull/271.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #271 commit 867cdd0c04d629eaf44a0e2031f447d03bf67bcc Author: rawkintrevo <trevor.d.grant@gmail.com> Date: 2017-02-02T06:18:21Z MAHOUT-1894 Add support for spark 2.x MAHOUT-1894 Add support for spark 2.x
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user pferrel commented on the issue:

          https://github.com/apache/mahout/pull/271

          I'm soooo into dropping a special Mahout shell, do your comments mean we just run Mahout classes in the Spark shell for Spark 2.x? Does this work with and without (@andrewpalumbo 's case) Zeppelin?

          IF we can compile Mahout with Scala 2.11 fairly easily (excluding the shell) and IF we can run Mahout with some helper scripts in the Spark Shell, we can drop the Mahou Shell code and get all the advantages of using the plain Spark Shell with our extensions. Can/should this be done?

          I realize I've asked these before but this seems the best forum.

          Show
          githubbot ASF GitHub Bot added a comment - Github user pferrel commented on the issue: https://github.com/apache/mahout/pull/271 I'm soooo into dropping a special Mahout shell, do your comments mean we just run Mahout classes in the Spark shell for Spark 2.x? Does this work with and without (@andrewpalumbo 's case) Zeppelin? IF we can compile Mahout with Scala 2.11 fairly easily (excluding the shell) and IF we can run Mahout with some helper scripts in the Spark Shell, we can drop the Mahou Shell code and get all the advantages of using the plain Spark Shell with our extensions. Can/should this be done? I realize I've asked these before but this seems the best forum.
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user rawkintrevo commented on the issue:

          https://github.com/apache/mahout/pull/271

          @pferrel In short yes. The idea here is we entirely drop the Mahout Shell. It was also the blocker for upgrading to Spark 2.x.

          The Zeppelin integration, for all intents and purposes is a spark shell + some imports and setting up the distributed context.

          So that is what we're doing here.

          Hopefully removing the shell will also clear the way for the Scala 2.11 upgrade / profile.

          Show
          githubbot ASF GitHub Bot added a comment - Github user rawkintrevo commented on the issue: https://github.com/apache/mahout/pull/271 @pferrel In short yes. The idea here is we entirely drop the Mahout Shell. It was also the blocker for upgrading to Spark 2.x. The Zeppelin integration, for all intents and purposes is a spark shell + some imports and setting up the distributed context. So that is what we're doing here. Hopefully removing the shell will also clear the way for the Scala 2.11 upgrade / profile.
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user andrewpalumbo commented on the issue:

          https://github.com/apache/mahout/pull/271

          hmm.. just tried to launch into `local[4]` and blew it up:

          ```
          AP-RE-X16743C45L:mahout apalumbo$ MASTER=local[4] mahout spark-shell
          log4j:WARN No appenders could be found for logger (org.apache.hadoop.metrics2.lib.MutableMetricsFactory).
          log4j:WARN Please initialize the log4j system properly.
          log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for more info.
          Using Spark's repl log4j profile: org/apache/spark/log4j-defaults-repl.properties
          To adjust logging level use sc.setLogLevel("INFO")
          Welcome to
          ____ __
          / _/_ ___ ____/ /_
          \ \/ _ \/ _ `/ __/ '/
          /__/ ./_,// //_\ version 1.6.2
          /_/

          Using Scala version 2.10.5 (Java HotSpot(TM) 64-Bit Server VM, Java 1.8.0_102)
          Type in expressions to have them evaluated.
          Type :help for more information.
          Spark context available as sc.
          SQL context available as sqlContext.
          Loading /Users/apalumbo/sandbox/mahout/bin/load-shell.scala...
          import org.apache.mahout.math._
          import org.apache.mahout.math.scalabindings._
          import org.apache.mahout.math.drm._
          import org.apache.mahout.math.scalabindings.RLikeOps._
          import org.apache.mahout.math.drm.RLikeDrmOps._
          import org.apache.mahout.sparkbindings._
          sdc: org.apache.mahout.sparkbindings.SparkDistributedContext = org.apache.mahout.sparkbindings.SparkDistributedContext@73e0c775

          _ _
          _ __ ___ __ | |_ ___ _ | |
          '_ ` _ \ / ` | ' \ / _ | | | | __|

                (_         (_)   _     _
          _
            _   _ _, _   _ __/ _,_ __ version 0.13.0

          Exception in thread "main" java.io.FileNotFoundException: spark-shell (Is a directory)
          at java.io.FileInputStream.open0(Native Method)
          at java.io.FileInputStream.open(FileInputStream.java:195)
          at java.io.FileInputStream.<init>(FileInputStream.java:138)
          at scala.reflect.io.File.inputStream(File.scala:97)
          at scala.reflect.io.File.inputStream(File.scala:82)
          at scala.reflect.io.Streamable$Chars$class.reader(Streamable.scala:93)
          at scala.reflect.io.File.reader(File.scala:82)
          at scala.reflect.io.Streamable$Chars$class.bufferedReader(Streamable.scala:98)
          at scala.reflect.io.File.bufferedReader(File.scala:82)
          at scala.reflect.io.Streamable$Chars$class.bufferedReader(Streamable.scala:97)
          at scala.reflect.io.File.bufferedReader(File.scala:82)
          at scala.reflect.io.Streamable$Chars$class.applyReader(Streamable.scala:103)
          at scala.reflect.io.File.applyReader(File.scala:82)
          at org.apache.spark.repl.SparkILoop$$anonfun$org$apache$spark$repl$SparkILoop$$interpretAllFrom$1$$anonfun$apply$mcV$sp$1.apply$mcV$sp(SparkILoop.scala:677)
          at org.apache.spark.repl.SparkILoop$$anonfun$org$apache$spark$repl$SparkILoop$$interpretAllFrom$1$$anonfun$apply$mcV$sp$1.apply(SparkILoop.scala:677)
          at org.apache.spark.repl.SparkILoop$$anonfun$org$apache$spark$repl$SparkILoop$$interpretAllFrom$1$$anonfun$apply$mcV$sp$1.apply(SparkILoop.scala:677)
          at org.apache.spark.repl.SparkILoop.org$apache$spark$repl$SparkILoop$$savingReplayStack(SparkILoop.scala:162)
          at org.apache.spark.repl.SparkILoop$$anonfun$org$apache$spark$repl$SparkILoop$$interpretAllFrom$1.apply$mcV$sp(SparkILoop.scala:676)
          at org.apache.spark.repl.SparkILoop$$anonfun$org$apache$spark$repl$SparkILoop$$interpretAllFrom$1.apply(SparkILoop.scala:676)
          at org.apache.spark.repl.SparkILoop$$anonfun$org$apache$spark$repl$SparkILoop$$interpretAllFrom$1.apply(SparkILoop.scala:676)
          at org.apache.spark.repl.SparkILoop.savingReader(SparkILoop.scala:167)
          at org.apache.spark.repl.SparkILoop.org$apache$spark$repl$SparkILoop$$interpretAllFrom(SparkILoop.scala:675)
          at org.apache.spark.repl.SparkILoop$$anonfun$org$apache$spark$repl$SparkILoop$$loadCommand$1.apply(SparkILoop.scala:740)
          at org.apache.spark.repl.SparkILoop$$anonfun$org$apache$spark$repl$SparkILoop$$loadCommand$1.apply(SparkILoop.scala:739)
          at org.apache.spark.repl.SparkILoop.withFile(SparkILoop.scala:733)
          at org.apache.spark.repl.SparkILoop.org$apache$spark$repl$SparkILoop$$loadCommand(SparkILoop.scala:739)

          {...}

          ```

          Show
          githubbot ASF GitHub Bot added a comment - Github user andrewpalumbo commented on the issue: https://github.com/apache/mahout/pull/271 hmm.. just tried to launch into `local [4] ` and blew it up: ``` AP-RE-X16743C45L:mahout apalumbo$ MASTER=local [4] mahout spark-shell log4j:WARN No appenders could be found for logger (org.apache.hadoop.metrics2.lib.MutableMetricsFactory). log4j:WARN Please initialize the log4j system properly. log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for more info. Using Spark's repl log4j profile: org/apache/spark/log4j-defaults-repl.properties To adjust logging level use sc.setLogLevel("INFO") Welcome to ____ __ / _ / _ ___ ____ / / _ \ \/ _ \/ _ `/ __/ ' / /__ / . /_, / / / /_\ version 1.6.2 /_/ Using Scala version 2.10.5 (Java HotSpot(TM) 64-Bit Server VM, Java 1.8.0_102) Type in expressions to have them evaluated. Type :help for more information. Spark context available as sc. SQL context available as sqlContext. Loading /Users/apalumbo/sandbox/mahout/bin/load-shell.scala... import org.apache.mahout.math._ import org.apache.mahout.math.scalabindings._ import org.apache.mahout.math.drm._ import org.apache.mahout.math.scalabindings.RLikeOps._ import org.apache.mahout.math.drm.RLikeDrmOps._ import org.apache.mahout.sparkbindings._ sdc: org.apache.mahout.sparkbindings.SparkDistributedContext = org.apache.mahout.sparkbindings.SparkDistributedContext@73e0c775 _ _ _ __ ___ __ | | _ ___ _ | | '_ ` _ \ / ` | ' \ / _ | | | | __|       (_         (_)   _     _ _   _   _ _ , _   _ __ / _ ,_ __ version 0.13.0 Exception in thread "main" java.io.FileNotFoundException: spark-shell (Is a directory) at java.io.FileInputStream.open0(Native Method) at java.io.FileInputStream.open(FileInputStream.java:195) at java.io.FileInputStream.<init>(FileInputStream.java:138) at scala.reflect.io.File.inputStream(File.scala:97) at scala.reflect.io.File.inputStream(File.scala:82) at scala.reflect.io.Streamable$Chars$class.reader(Streamable.scala:93) at scala.reflect.io.File.reader(File.scala:82) at scala.reflect.io.Streamable$Chars$class.bufferedReader(Streamable.scala:98) at scala.reflect.io.File.bufferedReader(File.scala:82) at scala.reflect.io.Streamable$Chars$class.bufferedReader(Streamable.scala:97) at scala.reflect.io.File.bufferedReader(File.scala:82) at scala.reflect.io.Streamable$Chars$class.applyReader(Streamable.scala:103) at scala.reflect.io.File.applyReader(File.scala:82) at org.apache.spark.repl.SparkILoop$$anonfun$org$apache$spark$repl$SparkILoop$$interpretAllFrom$1$$anonfun$apply$mcV$sp$1.apply$mcV$sp(SparkILoop.scala:677) at org.apache.spark.repl.SparkILoop$$anonfun$org$apache$spark$repl$SparkILoop$$interpretAllFrom$1$$anonfun$apply$mcV$sp$1.apply(SparkILoop.scala:677) at org.apache.spark.repl.SparkILoop$$anonfun$org$apache$spark$repl$SparkILoop$$interpretAllFrom$1$$anonfun$apply$mcV$sp$1.apply(SparkILoop.scala:677) at org.apache.spark.repl.SparkILoop.org$apache$spark$repl$SparkILoop$$savingReplayStack(SparkILoop.scala:162) at org.apache.spark.repl.SparkILoop$$anonfun$org$apache$spark$repl$SparkILoop$$interpretAllFrom$1.apply$mcV$sp(SparkILoop.scala:676) at org.apache.spark.repl.SparkILoop$$anonfun$org$apache$spark$repl$SparkILoop$$interpretAllFrom$1.apply(SparkILoop.scala:676) at org.apache.spark.repl.SparkILoop$$anonfun$org$apache$spark$repl$SparkILoop$$interpretAllFrom$1.apply(SparkILoop.scala:676) at org.apache.spark.repl.SparkILoop.savingReader(SparkILoop.scala:167) at org.apache.spark.repl.SparkILoop.org$apache$spark$repl$SparkILoop$$interpretAllFrom(SparkILoop.scala:675) at org.apache.spark.repl.SparkILoop$$anonfun$org$apache$spark$repl$SparkILoop$$loadCommand$1.apply(SparkILoop.scala:740) at org.apache.spark.repl.SparkILoop$$anonfun$org$apache$spark$repl$SparkILoop$$loadCommand$1.apply(SparkILoop.scala:739) at org.apache.spark.repl.SparkILoop.withFile(SparkILoop.scala:733) at org.apache.spark.repl.SparkILoop.org$apache$spark$repl$SparkILoop$$loadCommand(SparkILoop.scala:739) {...} ```
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user rawkintrevo commented on the issue:

          https://github.com/apache/mahout/pull/271

          Possibly a regression last night when I moved the location/ changed name of load.scala -> bin/load-shell.scala

          Show
          githubbot ASF GitHub Bot added a comment - Github user rawkintrevo commented on the issue: https://github.com/apache/mahout/pull/271 Possibly a regression last night when I moved the location/ changed name of load.scala -> bin/load-shell.scala
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user rawkintrevo commented on the issue:

          https://github.com/apache/mahout/pull/271

          Confirmed shell explosion- fixed by deleting $MAHOUT_HOME/bin/metastore_db

          My shell explosion was a slightly different flavor though. Can you try the above?

          Show
          githubbot ASF GitHub Bot added a comment - Github user rawkintrevo commented on the issue: https://github.com/apache/mahout/pull/271 Confirmed shell explosion- fixed by deleting $MAHOUT_HOME/bin/metastore_db My shell explosion was a slightly different flavor though. Can you try the above?
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user andrewpalumbo commented on the issue:

          https://github.com/apache/mahout/pull/271

          On a CentOS based ec2 spark standalone instance with 3 workers, I'm not even getting a launch of the `$SPARK_HOME/bin/spark-shell`:
          ```
          root@ip-47-108-23-12 spark]$ mahout spark-shell
          /vol0/mahout/bin/mahout: line 294: /bin/spark-shell: No such file or directory
          ```
          not sure why that would be.. Its possible that i
          ```
          294 $SPARK_HOME/bin/spark-shell -classpath "$CLASSPATH" -i $MAHOUT_HOME/bin/load-shell.scala --conf spark.kryo.referenceTracking=false --conf spark.kryo
          ```

          Show
          githubbot ASF GitHub Bot added a comment - Github user andrewpalumbo commented on the issue: https://github.com/apache/mahout/pull/271 On a CentOS based ec2 spark standalone instance with 3 workers, I'm not even getting a launch of the `$SPARK_HOME/bin/spark-shell`: ``` root@ip-47-108-23-12 spark]$ mahout spark-shell /vol0/mahout/bin/mahout: line 294: /bin/spark-shell: No such file or directory ``` not sure why that would be.. Its possible that i ``` 294 $SPARK_HOME/bin/spark-shell -classpath "$CLASSPATH" -i $MAHOUT_HOME/bin/load-shell.scala --conf spark.kryo.referenceTracking=false --conf spark.kryo ```
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user andrewpalumbo commented on a diff in the pull request:

          https://github.com/apache/mahout/pull/271#discussion_r99481926

          — Diff: distribution/pom.xml —
          @@ -211,10 +207,6 @@
          </dependency>
          — End diff –

          there is an other shell dependency @ line 158 that needs to come out.

          Show
          githubbot ASF GitHub Bot added a comment - Github user andrewpalumbo commented on a diff in the pull request: https://github.com/apache/mahout/pull/271#discussion_r99481926 — Diff: distribution/pom.xml — @@ -211,10 +207,6 @@ </dependency> — End diff – there is an other shell dependency @ line 158 that needs to come out.
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user andrewpalumbo commented on the issue:

          https://github.com/apache/mahout/pull/271

          rebuilt everything from scratch on linux
          ```
          mahout spark-shell
          ```
          fails with:

          ```
          root mahout]$ mahout spark-shell
          /vol0/mahout/bin/mahout: line 294: /bin/spark-shell: No such file or directory
          root mahout]$
          ```

          Spark 1.6.1.. maybe `-i` is buggy in that version?

          Show
          githubbot ASF GitHub Bot added a comment - Github user andrewpalumbo commented on the issue: https://github.com/apache/mahout/pull/271 rebuilt everything from scratch on linux ``` mahout spark-shell ``` fails with: ``` root mahout]$ mahout spark-shell /vol0/mahout/bin/mahout: line 294: /bin/spark-shell: No such file or directory root mahout]$ ``` Spark 1.6.1.. maybe `-i` is buggy in that version?
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user andrewpalumbo commented on the issue:

          https://github.com/apache/mahout/pull/271

          wait - I'm getting from the shell on the current master so the errors may be in `findMahoutJars()`

          @rawkintrevo heve you been working off of the current master? was the shell working for you?

          ```
          Exception in thread "main" java.lang.NoClassDefFoundError: org/apache/spark/repl/SparkILoop
          at org.apache.mahout.sparkbindings.shell.Main.main(Main.scala)
          Caused by: java.lang.ClassNotFoundException: org.apache.spark.repl.SparkILoop
          at java.net.URLClassLoader$1.run(URLClassLoader.java:359)
          at java.net.URLClassLoader$1.run(URLClassLoader.java:348)
          at java.security.AccessController.doPrivileged(Native Method)
          at java.net.URLClassLoader.findClass(URLClassLoader.java:347)
          at java.lang.ClassLoader.loadClass(ClassLoader.java:425)
          at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308)
          at java.lang.ClassLoader.loadClass(ClassLoader.java:358)
          ```

          Show
          githubbot ASF GitHub Bot added a comment - Github user andrewpalumbo commented on the issue: https://github.com/apache/mahout/pull/271 wait - I'm getting from the shell on the current master so the errors may be in `findMahoutJars()` @rawkintrevo heve you been working off of the current master? was the shell working for you? ``` Exception in thread "main" java.lang.NoClassDefFoundError: org/apache/spark/repl/SparkILoop at org.apache.mahout.sparkbindings.shell.Main.main(Main.scala) Caused by: java.lang.ClassNotFoundException: org.apache.spark.repl.SparkILoop at java.net.URLClassLoader$1.run(URLClassLoader.java:359) at java.net.URLClassLoader$1.run(URLClassLoader.java:348) at java.security.AccessController.doPrivileged(Native Method) at java.net.URLClassLoader.findClass(URLClassLoader.java:347) at java.lang.ClassLoader.loadClass(ClassLoader.java:425) at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308) at java.lang.ClassLoader.loadClass(ClassLoader.java:358) ```
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user andrewpalumbo commented on the issue:

          https://github.com/apache/mahout/pull/271

          Rebuilding with this on a remote with:

          ```
          mvn clean package install -Pviennacl-omp -Phadoop2 -Dspark.version=2.1.0 -DskipTests
          ```
          geting:
          ```
          [ERROR] Failed to execute goal org.apache.maven.plugins:maven-assembly-plugin:2.4.1:single (job) on project mahout-mr: Failed to create assembly: Error c
          reating assembly archive job: Problem creating jar: jar:file:/vol0/mahout/mr/target/mahout-mr-0.13.0-SNAPSHOT.jar!/org/apache/mahout/cf/taste/impl/simila
          rity/precompute/MultithreadedBatchItemSimilarities$Output.class: JAR entry org/apache/mahout/cf/taste/impl/similarity/precompute/MultithreadedBatchItemSi
          milarities$Output.class not found in /vol0/mahout/mr/target/mahout-mr-0.13.0-SNAPSHOT.jar -> [Help 1]
          ```

          Show
          githubbot ASF GitHub Bot added a comment - Github user andrewpalumbo commented on the issue: https://github.com/apache/mahout/pull/271 Rebuilding with this on a remote with: ``` mvn clean package install -Pviennacl-omp -Phadoop2 -Dspark.version=2.1.0 -DskipTests ``` geting: ``` [ERROR] Failed to execute goal org.apache.maven.plugins:maven-assembly-plugin:2.4.1:single (job) on project mahout-mr: Failed to create assembly: Error c reating assembly archive job: Problem creating jar: jar: file:/vol0/mahout/mr/target/mahout-mr-0.13.0-SNAPSHOT.jar!/org/apache/mahout/cf/taste/impl/simila rity/precompute/MultithreadedBatchItemSimilarities$Output.class: JAR entry org/apache/mahout/cf/taste/impl/similarity/precompute/MultithreadedBatchItemSi milarities$Output.class not found in /vol0/mahout/mr/target/mahout-mr-0.13.0-SNAPSHOT.jar -> [Help 1] ```
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user andrewpalumbo commented on the issue:

          https://github.com/apache/mahout/pull/271

          Can verify that on a clean build with all ASF-mirrored SPARK and MAVEN shell is not working:

          ```
          $ echo $SPARK_HOME
          /root/spark-2.1.0-bin-hadoop2.6
          $ mvn clean install -Pviennacl-omp -Phadoop2 -Dspark.version=2.1.0 -DskipTests

          {...}

          INFO] Mahout Build Tools ................................. SUCCESS [ 2.228 s]
          [INFO] Apache Mahout ...................................... SUCCESS [ 0.043 s]
          [INFO] Mahout Math ........................................ SUCCESS [ 9.467 s]
          [INFO] Mahout HDFS ........................................ SUCCESS [ 1.941 s]
          [INFO] Mahout Map-Reduce .................................. SUCCESS [ 18.526 s]
          [INFO] Mahout Integration ................................. SUCCESS [ 2.992 s]
          [INFO] Mahout Examples .................................... SUCCESS [ 18.727 s]
          [INFO] Mahout Math Scala bindings ......................... SUCCESS [ 46.060 s]
          [INFO] Mahout Spark bindings .............................. SUCCESS [ 52.530 s]
          [INFO] Mahout Flink bindings .............................. SUCCESS [ 36.695 s]
          [INFO] Mahout Native VienniaCL OpenMP Bindings ............ SUCCESS [ 23.616 s]
          [INFO] Mahout Release Package ............................. SUCCESS [ 1.706 s]
          [INFO] Mahout H2O backend ................................. SUCCESS [ 20.008 s]
          [INFO] ------------------------------------------------------------------------
          [INFO] BUILD SUCCESS
          [INFO] ------------------------------------------------------------------------
          [INFO] Total time: 03:55 min
          [INFO] Finished at: 2017-02-05T22:58:40+00:00
          [INFO] Final Memory: 145M/2588M
          [INFO] ------------------------------------------------------------------------
          root@ip-123-32-17-97 mahout]$ mahout spark-shell
          /vol0/mahout/bin/mahout: line 294: /bin/spark-shell: No such file or directory
          ```

          Show
          githubbot ASF GitHub Bot added a comment - Github user andrewpalumbo commented on the issue: https://github.com/apache/mahout/pull/271 Can verify that on a clean build with all ASF-mirrored SPARK and MAVEN shell is not working: ``` $ echo $SPARK_HOME /root/spark-2.1.0-bin-hadoop2.6 $ mvn clean install -Pviennacl-omp -Phadoop2 -Dspark.version=2.1.0 -DskipTests {...} INFO] Mahout Build Tools ................................. SUCCESS [ 2.228 s] [INFO] Apache Mahout ...................................... SUCCESS [ 0.043 s] [INFO] Mahout Math ........................................ SUCCESS [ 9.467 s] [INFO] Mahout HDFS ........................................ SUCCESS [ 1.941 s] [INFO] Mahout Map-Reduce .................................. SUCCESS [ 18.526 s] [INFO] Mahout Integration ................................. SUCCESS [ 2.992 s] [INFO] Mahout Examples .................................... SUCCESS [ 18.727 s] [INFO] Mahout Math Scala bindings ......................... SUCCESS [ 46.060 s] [INFO] Mahout Spark bindings .............................. SUCCESS [ 52.530 s] [INFO] Mahout Flink bindings .............................. SUCCESS [ 36.695 s] [INFO] Mahout Native VienniaCL OpenMP Bindings ............ SUCCESS [ 23.616 s] [INFO] Mahout Release Package ............................. SUCCESS [ 1.706 s] [INFO] Mahout H2O backend ................................. SUCCESS [ 20.008 s] [INFO] ------------------------------------------------------------------------ [INFO] BUILD SUCCESS [INFO] ------------------------------------------------------------------------ [INFO] Total time: 03:55 min [INFO] Finished at: 2017-02-05T22:58:40+00:00 [INFO] Final Memory: 145M/2588M [INFO] ------------------------------------------------------------------------ root@ip-123-32-17-97 mahout]$ mahout spark-shell /vol0/mahout/bin/mahout: line 294: /bin/spark-shell: No such file or directory ```
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user andrewpalumbo commented on the issue:

          https://github.com/apache/mahout/pull/271

          @rawkintrevo please disregard above comments (aside from the line note which breaks the build). I am attributing it to an old ami that I tried to test OpenMP and Spark 2.0 with the new shell out on.

          Show
          githubbot ASF GitHub Bot added a comment - Github user andrewpalumbo commented on the issue: https://github.com/apache/mahout/pull/271 @rawkintrevo please disregard above comments (aside from the line note which breaks the build). I am attributing it to an old ami that I tried to test OpenMP and Spark 2.0 with the new shell out on.
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user skanjila commented on the issue:

          https://github.com/apache/mahout/pull/271

          @rawkintrevo I was wondering how we will evolve the shell if a new spark version comes out, also I am wondering what the use cases are for mahout-shell , seems like most people use mahout as an embedded application or a library, is the shell just to test out a few things? I would be all for removing the shell altogether actually. Less code to maintain in the long run, let me know if I am missing something here.

          Show
          githubbot ASF GitHub Bot added a comment - Github user skanjila commented on the issue: https://github.com/apache/mahout/pull/271 @rawkintrevo I was wondering how we will evolve the shell if a new spark version comes out, also I am wondering what the use cases are for mahout-shell , seems like most people use mahout as an embedded application or a library, is the shell just to test out a few things? I would be all for removing the shell altogether actually. Less code to maintain in the long run, let me know if I am missing something here.
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user rawkintrevo commented on the issue:

          https://github.com/apache/mahout/pull/271

          @skanjila the shell is useful enough that I'd like to keep it around if possible. Some reasons off the cuff:

          • Good for 'demo-ing' mahout. Fire up the shell do some simple stuff.
          • Good for sanity checking bugs in Zeppelin (something very close to my heart)
          • As we move to add algorithms, I envision Mahout being used for more interactive data science. In that world there is a lot of iterative "try this, see what happens, try that" kind of approach. I do most of that in Zeppelin, but some people may use JetBrains/ other IDEs and the shell is useful in these cases.

          To your point- yes, 86ing the entire shell module certainly poses some very attractive advantages. What we're seeing in this PR is an opportunity to get best of both worlds (no code, but still have a shell). Just need to work out some kinks on getting it working with spark-shell correctly.

          Show
          githubbot ASF GitHub Bot added a comment - Github user rawkintrevo commented on the issue: https://github.com/apache/mahout/pull/271 @skanjila the shell is useful enough that I'd like to keep it around if possible. Some reasons off the cuff: Good for 'demo-ing' mahout. Fire up the shell do some simple stuff. Good for sanity checking bugs in Zeppelin (something very close to my heart) As we move to add algorithms, I envision Mahout being used for more interactive data science. In that world there is a lot of iterative "try this, see what happens, try that" kind of approach. I do most of that in Zeppelin, but some people may use JetBrains/ other IDEs and the shell is useful in these cases. To your point- yes, 86ing the entire shell module certainly poses some very attractive advantages. What we're seeing in this PR is an opportunity to get best of both worlds (no code, but still have a shell). Just need to work out some kinks on getting it working with spark-shell correctly.
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user skanjila commented on the issue:

          https://github.com/apache/mahout/pull/271

          @rawkintrevo the notion of interactive data science is very interesting to me as thats what I do at work, however what is the advantage of using mahout for that versus doing it directly in spark shell using spark sql or the ml algorithms in spark, is that where Samsara comes in, just trying to understand the tradeoffs between the spark and the mahout worlds

          Show
          githubbot ASF GitHub Bot added a comment - Github user skanjila commented on the issue: https://github.com/apache/mahout/pull/271 @rawkintrevo the notion of interactive data science is very interesting to me as thats what I do at work, however what is the advantage of using mahout for that versus doing it directly in spark shell using spark sql or the ml algorithms in spark, is that where Samsara comes in, just trying to understand the tradeoffs between the spark and the mahout worlds
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user rawkintrevo commented on the issue:

          https://github.com/apache/mahout/pull/271

          @skanjila yes. In short- SparkML has a few non-extensible algorithms with limited functionality. Mahout lets your write your own algorithms, but at the moment there are some amazing tools to help you do that (distributed svd) but not many pre baked algorithms (you still need to go back to sparkML for Random Forrests, etc).

          With the new algorithms framework, I hope to see Mahout catch up and exceed SparkML's pre-canned algorithm collection in short order, driven primarily by community involvement.

          Show
          githubbot ASF GitHub Bot added a comment - Github user rawkintrevo commented on the issue: https://github.com/apache/mahout/pull/271 @skanjila yes. In short- SparkML has a few non-extensible algorithms with limited functionality. Mahout lets your write your own algorithms, but at the moment there are some amazing tools to help you do that (distributed svd) but not many pre baked algorithms (you still need to go back to sparkML for Random Forrests, etc). With the new algorithms framework, I hope to see Mahout catch up and exceed SparkML's pre-canned algorithm collection in short order, driven primarily by community involvement.
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user rawkintrevo commented on the issue:

          https://github.com/apache/mahout/pull/271

          @andrewpalumbo checked against spark 1.6.1/2.0.2/2.1.0 on another box- no issues.

          Can someone else help test this?

          Show
          githubbot ASF GitHub Bot added a comment - Github user rawkintrevo commented on the issue: https://github.com/apache/mahout/pull/271 @andrewpalumbo checked against spark 1.6.1/2.0.2/2.1.0 on another box- no issues. Can someone else help test this?
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user andrewpalumbo commented on the issue:

          https://github.com/apache/mahout/pull/271

          Great! Is there anything left to do here?

          Sent from my Verizon Wireless 4G LTE smartphone

          -------- Original message --------
          From: Trevor Grant <notifications@github.com>
          Date: 02/06/2017 6:37 PM (GMT-08:00)
          To: apache/mahout <mahout@noreply.github.com>
          Cc: Andrew Palumbo <ap.dev@outlook.com>, Mention <mention@noreply.github.com>
          Subject: Re: [apache/mahout] MAHOUT-1894 Add Support for Spark 2.x (#271)

          @andrewpalumbo<https://github.com/andrewpalumbo> checked against spark 1.6.1/2.0.2/2.1.0 on another box- no issues.

          Can someone else help test this?


          You are receiving this because you were mentioned.
          Reply to this email directly, view it on GitHub<https://github.com/apache/mahout/pull/271#issuecomment-277883801>, or mute the thread<https://github.com/notifications/unsubscribe-auth/AHU2HeRKlxw62_ne3qi5KbvpvWGPYmzbks5rZ9jkgaJpZM4L0xAh>.

          Show
          githubbot ASF GitHub Bot added a comment - Github user andrewpalumbo commented on the issue: https://github.com/apache/mahout/pull/271 Great! Is there anything left to do here? Sent from my Verizon Wireless 4G LTE smartphone -------- Original message -------- From: Trevor Grant <notifications@github.com> Date: 02/06/2017 6:37 PM (GMT-08:00) To: apache/mahout <mahout@noreply.github.com> Cc: Andrew Palumbo <ap.dev@outlook.com>, Mention <mention@noreply.github.com> Subject: Re: [apache/mahout] MAHOUT-1894 Add Support for Spark 2.x (#271) @andrewpalumbo< https://github.com/andrewpalumbo > checked against spark 1.6.1/2.0.2/2.1.0 on another box- no issues. Can someone else help test this? — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub< https://github.com/apache/mahout/pull/271#issuecomment-277883801 >, or mute the thread< https://github.com/notifications/unsubscribe-auth/AHU2HeRKlxw62_ne3qi5KbvpvWGPYmzbks5rZ9jkgaJpZM4L0xAh >.
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user andrewpalumbo commented on the issue:

          https://github.com/apache/mahout/pull/271

          +1

          Sent from my Verizon Wireless 4G LTE smartphone

          -------- Original message --------
          From: Trevor Grant <notifications@github.com>
          Date: 02/06/2017 4:20 PM (GMT-08:00)
          To: apache/mahout <mahout@noreply.github.com>
          Cc: Andrew Palumbo <ap.dev@outlook.com>, Mention <mention@noreply.github.com>
          Subject: Re: [apache/mahout] MAHOUT-1894 Add Support for Spark 2.x (#271)

          @skanjila<https://github.com/skanjila> yes. In short- SparkML has a few non-extensible algorithms with limited functionality. Mahout lets your write your own algorithms, but at the moment there are some amazing tools to help you do that (distributed svd) but not many pre baked algorithms (you still need to go back to sparkML for Random Forrests, etc).

          With the new algorithms framework, I hope to see Mahout catch up and exceed SparkML's pre-canned algorithm collection in short order, driven primarily by community involvement.


          You are receiving this because you were mentioned.
          Reply to this email directly, view it on GitHub<https://github.com/apache/mahout/pull/271#issuecomment-277858736>, or mute the thread<https://github.com/notifications/unsubscribe-auth/AHU2Ha2jHySN6ab9zvmFuBSeuRJDaTNPks5rZ7jUgaJpZM4L0xAh>.

          Show
          githubbot ASF GitHub Bot added a comment - Github user andrewpalumbo commented on the issue: https://github.com/apache/mahout/pull/271 +1 Sent from my Verizon Wireless 4G LTE smartphone -------- Original message -------- From: Trevor Grant <notifications@github.com> Date: 02/06/2017 4:20 PM (GMT-08:00) To: apache/mahout <mahout@noreply.github.com> Cc: Andrew Palumbo <ap.dev@outlook.com>, Mention <mention@noreply.github.com> Subject: Re: [apache/mahout] MAHOUT-1894 Add Support for Spark 2.x (#271) @skanjila< https://github.com/skanjila > yes. In short- SparkML has a few non-extensible algorithms with limited functionality. Mahout lets your write your own algorithms, but at the moment there are some amazing tools to help you do that (distributed svd) but not many pre baked algorithms (you still need to go back to sparkML for Random Forrests, etc). With the new algorithms framework, I hope to see Mahout catch up and exceed SparkML's pre-canned algorithm collection in short order, driven primarily by community involvement. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub< https://github.com/apache/mahout/pull/271#issuecomment-277858736 >, or mute the thread< https://github.com/notifications/unsubscribe-auth/AHU2Ha2jHySN6ab9zvmFuBSeuRJDaTNPks5rZ7jUgaJpZM4L0xAh >.
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user rawkintrevo commented on the issue:

          https://github.com/apache/mahout/pull/271

          I'd like someone else to test this.

          Also curious if this solves MAHOUT-1897(https://issues.apache.org/jira/browse/MAHOUT-1897)

          This DOES NOT solve MAHOUT-1892(https://issues.apache.org/jira/browse/MAHOUT-1892) serialization when doing a map block in the shell

          Show
          githubbot ASF GitHub Bot added a comment - Github user rawkintrevo commented on the issue: https://github.com/apache/mahout/pull/271 I'd like someone else to test this. Also curious if this solves MAHOUT-1897 ( https://issues.apache.org/jira/browse/MAHOUT-1897 ) This DOES NOT solve MAHOUT-1892 ( https://issues.apache.org/jira/browse/MAHOUT-1892 ) serialization when doing a map block in the shell
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user skanjila commented on the issue:

          https://github.com/apache/mahout/pull/271

          @rawkintrevo I was going to test this on an azure vm, do you guys still need help testing?

          Show
          githubbot ASF GitHub Bot added a comment - Github user skanjila commented on the issue: https://github.com/apache/mahout/pull/271 @rawkintrevo I was going to test this on an azure vm, do you guys still need help testing?
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user skanjila commented on the issue:

          https://github.com/apache/mahout/pull/271

          @rawkintrevo here's what I see when testing on an azure vm:
          saikan@dsexperiments:~/code/mahout$ MASTER=local[4] ./bin/mahout spark-shell
          log4j:WARN No appenders could be found for logger (org.apache.hadoop.metrics2.lib.MutableMetricsFactory).
          log4j:WARN Please initialize the log4j system properly.
          log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for more info.
          Using Spark's repl log4j profile: org/apache/spark/log4j-defaults-repl.properties
          To adjust logging level use sc.setLogLevel("INFO")
          Welcome to
          ____ __
          / _/_ ___ ____/ /_
          \ \/ _ \/ _ `/ __/ '/
          /__/ ./_,// //_\ version 1.6.2
          /_/

          Using Scala version 2.10.5 (Java HotSpot(TM) 64-Bit Server VM, Java 1.7.0_80)
          Type in expressions to have them evaluated.
          Type :help for more information.
          Spark context available as sc.
          17/02/07 18:24:40 WARN ObjectStore: Version information not found in metastore. hive.metastore.schema.verification is not enabled so recording the schema version 1.2.0
          17/02/07 18:24:40 WARN ObjectStore: Failed to get database default, returning NoSuchObjectException
          17/02/07 18:24:46 WARN ObjectStore: Version information not found in metastore. hive.metastore.schema.verification is not enabled so recording the schema version 1.2.0
          17/02/07 18:24:46 WARN ObjectStore: Failed to get database default, returning NoSuchObjectException
          SQL context available as sqlContext.
          Loading /home/saikan/code/mahout/bin/load-shell.scala...
          import org.apache.mahout.math._
          import org.apache.mahout.math.scalabindings._
          import org.apache.mahout.math.drm._
          import org.apache.mahout.math.scalabindings.RLikeOps._
          import org.apache.mahout.math.drm.RLikeDrmOps._
          import org.apache.mahout.sparkbindings._
          sdc: org.apache.mahout.sparkbindings.SparkDistributedContext = org.apache.mahout.sparkbindings.SparkDistributedContext@7804474a

          _ _
          _ __ ___ __ | |_ ___ _ | |
          '_ ` _ \ / ` | ' \ / _ | | | | __|

                (_         (_)   _     _
          _
            _   _ _, _   _ __/ _,_ __ version 0.13.0

          That file does not exist

          scala>

          Looks good to me, perhaps we should try some heavy matrix ops on it for further testing

          Show
          githubbot ASF GitHub Bot added a comment - Github user skanjila commented on the issue: https://github.com/apache/mahout/pull/271 @rawkintrevo here's what I see when testing on an azure vm: saikan@dsexperiments:~/code/mahout$ MASTER=local [4] ./bin/mahout spark-shell log4j:WARN No appenders could be found for logger (org.apache.hadoop.metrics2.lib.MutableMetricsFactory). log4j:WARN Please initialize the log4j system properly. log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for more info. Using Spark's repl log4j profile: org/apache/spark/log4j-defaults-repl.properties To adjust logging level use sc.setLogLevel("INFO") Welcome to ____ __ / _ / _ ___ ____ / / _ \ \/ _ \/ _ `/ __/ ' / /__ / . /_, / / / /_\ version 1.6.2 /_/ Using Scala version 2.10.5 (Java HotSpot(TM) 64-Bit Server VM, Java 1.7.0_80) Type in expressions to have them evaluated. Type :help for more information. Spark context available as sc. 17/02/07 18:24:40 WARN ObjectStore: Version information not found in metastore. hive.metastore.schema.verification is not enabled so recording the schema version 1.2.0 17/02/07 18:24:40 WARN ObjectStore: Failed to get database default, returning NoSuchObjectException 17/02/07 18:24:46 WARN ObjectStore: Version information not found in metastore. hive.metastore.schema.verification is not enabled so recording the schema version 1.2.0 17/02/07 18:24:46 WARN ObjectStore: Failed to get database default, returning NoSuchObjectException SQL context available as sqlContext. Loading /home/saikan/code/mahout/bin/load-shell.scala... import org.apache.mahout.math._ import org.apache.mahout.math.scalabindings._ import org.apache.mahout.math.drm._ import org.apache.mahout.math.scalabindings.RLikeOps._ import org.apache.mahout.math.drm.RLikeDrmOps._ import org.apache.mahout.sparkbindings._ sdc: org.apache.mahout.sparkbindings.SparkDistributedContext = org.apache.mahout.sparkbindings.SparkDistributedContext@7804474a _ _ _ __ ___ __ | | _ ___ _ | | '_ ` _ \ / ` | ' \ / _ | | | | __|       (_         (_)   _     _ _   _   _ _ , _   _ __ / _ ,_ __ version 0.13.0 That file does not exist scala> Looks good to me, perhaps we should try some heavy matrix ops on it for further testing
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user rawkintrevo commented on the issue:

          https://github.com/apache/mahout/pull/271

          @skanjila Thanks for testing! I usually run the ols example- though another type of test is probably advisable to truly detect bugs. Could you also confirm that it works in the following ways:
          Build mahout with `mvn clean package -Dspark.version=2.0.2` and then set `export SPARK_HOME=/path/to/spark-2.0.2-bin` and then again for spark 2.1.0?

          Thanks again!

          Show
          githubbot ASF GitHub Bot added a comment - Github user rawkintrevo commented on the issue: https://github.com/apache/mahout/pull/271 @skanjila Thanks for testing! I usually run the ols example- though another type of test is probably advisable to truly detect bugs. Could you also confirm that it works in the following ways: Build mahout with `mvn clean package -Dspark.version=2.0.2` and then set `export SPARK_HOME=/path/to/spark-2.0.2-bin` and then again for spark 2.1.0? Thanks again!
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user skanjila commented on the issue:

          https://github.com/apache/mahout/pull/271

          Here's the results with Spark 2.0.2
          saikan@dsexperiments:~/code/mahout$ MASTER=local[4] ./bin/mahout spark-shell
          Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
          Setting default log level to "WARN".
          To adjust logging level use sc.setLogLevel(newLevel).
          17/02/07 19:22:20 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
          17/02/07 19:22:21 WARN SparkContext: Use an existing SparkContext, some configuration may not take effect.
          Spark context Web UI available at http://10.4.9.4:4040
          Spark context available as 'sc' (master = local[4], app id = local-1486495341246).
          Spark session available as 'spark'.
          Loading /home/saikan/code/mahout/bin/load-shell.scala...
          import org.apache.mahout.math._
          import org.apache.mahout.math.scalabindings._
          import org.apache.mahout.math.drm._
          import org.apache.mahout.math.scalabindings.RLikeOps._
          import org.apache.mahout.math.drm.RLikeDrmOps._
          import org.apache.mahout.sparkbindings._
          sdc: org.apache.mahout.sparkbindings.SparkDistributedContext = org.apache.mahout.sparkbindings.SparkDistributedContext@43f44d37

          _ _
          _ __ ___ __ | |_ ___ _ | |
          '_ ` _ \ / ` | ' \ / _ | | | | __|

                (_         (_)   _     _
          _
            _   _ _, _   _ __/ _,_ __ version 0.13.0

          That file does not exist

          Welcome to
          ____ __
          / _/_ ___ ____/ /_
          \ \/ _ \/ _ `/ __/ '/
          /__/ ./_,// //_\ version 2.0.2
          /_/

          Using Scala version 2.11.8 (Java HotSpot(TM) 64-Bit Server VM, Java 1.7.0_80)
          Type in expressions to have them evaluated.
          Type :help for more information.

          scala>

          Show
          githubbot ASF GitHub Bot added a comment - Github user skanjila commented on the issue: https://github.com/apache/mahout/pull/271 Here's the results with Spark 2.0.2 saikan@dsexperiments:~/code/mahout$ MASTER=local [4] ./bin/mahout spark-shell Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties Setting default log level to "WARN". To adjust logging level use sc.setLogLevel(newLevel). 17/02/07 19:22:20 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable 17/02/07 19:22:21 WARN SparkContext: Use an existing SparkContext, some configuration may not take effect. Spark context Web UI available at http://10.4.9.4:4040 Spark context available as 'sc' (master = local [4] , app id = local-1486495341246). Spark session available as 'spark'. Loading /home/saikan/code/mahout/bin/load-shell.scala... import org.apache.mahout.math._ import org.apache.mahout.math.scalabindings._ import org.apache.mahout.math.drm._ import org.apache.mahout.math.scalabindings.RLikeOps._ import org.apache.mahout.math.drm.RLikeDrmOps._ import org.apache.mahout.sparkbindings._ sdc: org.apache.mahout.sparkbindings.SparkDistributedContext = org.apache.mahout.sparkbindings.SparkDistributedContext@43f44d37 _ _ _ __ ___ __ | | _ ___ _ | | '_ ` _ \ / ` | ' \ / _ | | | | __|       (_         (_)   _     _ _   _   _ _ , _   _ __ / _ ,_ __ version 0.13.0 That file does not exist Welcome to ____ __ / _ / _ ___ ____ / / _ \ \/ _ \/ _ `/ __/ ' / /__ / . /_, / / / /_\ version 2.0.2 /_/ Using Scala version 2.11.8 (Java HotSpot(TM) 64-Bit Server VM, Java 1.7.0_80) Type in expressions to have them evaluated. Type :help for more information. scala>
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user skanjila commented on the issue:

          https://github.com/apache/mahout/pull/271

          And here's the results for spark 2.1.0
          saikan@dsexperiments:~/code/mahout$ MASTER=local[4] ./bin/mahout spark-shell
          Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
          Setting default log level to "WARN".
          To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).
          17/02/07 19:27:59 WARN SparkContext: Support for Java 7 is deprecated as of Spark 2.0.0
          17/02/07 19:27:59 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
          17/02/07 19:28:05 WARN ObjectStore: Failed to get database global_temp, returning NoSuchObjectException
          Spark context Web UI available at http://10.4.9.4:4040
          Spark context available as 'sc' (master = local[4], app id = local-1486495680739).
          Spark session available as 'spark'.
          Loading /home/saikan/code/mahout/bin/load-shell.scala...
          import org.apache.mahout.math._
          import org.apache.mahout.math.scalabindings._
          import org.apache.mahout.math.drm._
          import org.apache.mahout.math.scalabindings.RLikeOps._
          import org.apache.mahout.math.drm.RLikeDrmOps._
          import org.apache.mahout.sparkbindings._
          sdc: org.apache.mahout.sparkbindings.SparkDistributedContext = org.apache.mahout.sparkbindings.SparkDistributedContext@3ea6753e

          _ _
          _ __ ___ __ | |_ ___ _ | |
          '_ ` _ \ / ` | ' \ / _ | | | | __|

                (_         (_)   _     _
          _
            _   _ _, _   _ __/ _,_ __ version 0.13.0

          That file does not exist

          Welcome to
          ____ __
          / _/_ ___ ____/ /_
          \ \/ _ \/ _ `/ __/ '/
          /__/ ./_,// //_\ version 2.1.0
          /_/

          Using Scala version 2.11.8 (Java HotSpot(TM) 64-Bit Server VM, Java 1.7.0_80)
          Type in expressions to have them evaluated.
          Type :help for more information.

          scala>
          I would highly recommend we come up with a beefy set of tests to validate the shell further, thoughts on which set of operations to consider other than OLS?

          Show
          githubbot ASF GitHub Bot added a comment - Github user skanjila commented on the issue: https://github.com/apache/mahout/pull/271 And here's the results for spark 2.1.0 saikan@dsexperiments:~/code/mahout$ MASTER=local [4] ./bin/mahout spark-shell Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties Setting default log level to "WARN". To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel). 17/02/07 19:27:59 WARN SparkContext: Support for Java 7 is deprecated as of Spark 2.0.0 17/02/07 19:27:59 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable 17/02/07 19:28:05 WARN ObjectStore: Failed to get database global_temp, returning NoSuchObjectException Spark context Web UI available at http://10.4.9.4:4040 Spark context available as 'sc' (master = local [4] , app id = local-1486495680739). Spark session available as 'spark'. Loading /home/saikan/code/mahout/bin/load-shell.scala... import org.apache.mahout.math._ import org.apache.mahout.math.scalabindings._ import org.apache.mahout.math.drm._ import org.apache.mahout.math.scalabindings.RLikeOps._ import org.apache.mahout.math.drm.RLikeDrmOps._ import org.apache.mahout.sparkbindings._ sdc: org.apache.mahout.sparkbindings.SparkDistributedContext = org.apache.mahout.sparkbindings.SparkDistributedContext@3ea6753e _ _ _ __ ___ __ | | _ ___ _ | | '_ ` _ \ / ` | ' \ / _ | | | | __|       (_         (_)   _     _ _   _   _ _ , _   _ __ / _ ,_ __ version 0.13.0 That file does not exist Welcome to ____ __ / _ / _ ___ ____ / / _ \ \/ _ \/ _ `/ __/ ' / /__ / . /_, / / / /_\ version 2.1.0 /_/ Using Scala version 2.11.8 (Java HotSpot(TM) 64-Bit Server VM, Java 1.7.0_80) Type in expressions to have them evaluated. Type :help for more information. scala> I would highly recommend we come up with a beefy set of tests to validate the shell further, thoughts on which set of operations to consider other than OLS?
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user skanjila commented on the issue:

          https://github.com/apache/mahout/pull/271

          @rawkintrevo is there any other help I can provide on this, maybe run through some example mscala scripts, let me know

          Show
          githubbot ASF GitHub Bot added a comment - Github user skanjila commented on the issue: https://github.com/apache/mahout/pull/271 @rawkintrevo is there any other help I can provide on this, maybe run through some example mscala scripts, let me know
          Hide
          rawkintrevo Trevor Grant added a comment - - edited

          @apalumbo is still reporting issues where ever he tries this.

          Want to make general call for testers to see where the 'gotchya' is.

          Here are instructions for testing- please help.

          Step 1. Clone Mahout-1894

          ```sh
          $ git clone https://github.com/rawkintrevo/mahout
          $ cd mahout
          $ git checkout mahout-1894
          ```

          Step 2. Download various Sparks
          ```sh
          $ wget http://d3kbcqa49mib13.cloudfront.net/spark-1.6.3-bin-hadoop2.6.tgz
          $ wget http://d3kbcqa49mib13.cloudfront.net/spark-2.0.2-bin-hadoop2.7.tgz
          $ wget http://d3kbcqa49mib13.cloudfront.net/spark-2.1.0-bin-hadoop2.7.tgz
          $ tar -xzf *tgz
          ```
          (only if those are the only tgz's in the directory)

          Step 3. Iteratively Build Mahout and Test Shell

          A) Spark 1.6.3
          ```sh
          $ mvn clean package -DskipTests -Dspark.version=1.6.3
          $ export SPARK_HOME=/path/to/spark/1.6.3
          $ bin/mahout spark-shell
          ```
          In the shell...
          ```scala
          scala> :load examples/bin/SparseSparseDrmTimer.mscala
          scala> timeSparseDRMMMul(200, 200, 200, 2, .2, 1234L)
          ```
          ^^ Should run with out error...
          Ctrl+C to close.

          B) Spark 2.0.2

          ```sh
          $ mvn clean package -DskipTests -Dspark.version=2.0.2
          $ export SPARK_HOME=/path/to/spark/2.0.2
          $ bin/mahout spark-shell
          ```
          In the shell...
          ```scala
          scala> :load examples/bin/SparseSparseDrmTimer.mscala
          scala> timeSparseDRMMMul(200, 200, 200, 2, .2, 1234L)
          ```
          ^^ Should run with out error...
          Ctrl+C to close.

          C) Spark 2.1.0
          ```sh
          $ mvn clean package -DskipTests -Dspark.version=2.1.0
          $ export SPARK_HOME=/path/to/spark/2.1.0
          $ bin/mahout spark-shell
          ```
          In the shell...

          ```scala
          scala> :load examples/bin/SparseSparseDrmTimer.mscala
          scala> timeSparseDRMMMul(200, 200, 200, 2, .2, 1234L)
          ```
          ^^ Should run with out error...
          Ctrl+C to close.

          Show
          rawkintrevo Trevor Grant added a comment - - edited @apalumbo is still reporting issues where ever he tries this. Want to make general call for testers to see where the 'gotchya' is. Here are instructions for testing- please help. Step 1. Clone Mahout-1894 ```sh $ git clone https://github.com/rawkintrevo/mahout $ cd mahout $ git checkout mahout-1894 ``` Step 2. Download various Sparks ```sh $ wget http://d3kbcqa49mib13.cloudfront.net/spark-1.6.3-bin-hadoop2.6.tgz $ wget http://d3kbcqa49mib13.cloudfront.net/spark-2.0.2-bin-hadoop2.7.tgz $ wget http://d3kbcqa49mib13.cloudfront.net/spark-2.1.0-bin-hadoop2.7.tgz $ tar -xzf *tgz ``` (only if those are the only tgz's in the directory) Step 3. Iteratively Build Mahout and Test Shell A) Spark 1.6.3 ```sh $ mvn clean package -DskipTests -Dspark.version=1.6.3 $ export SPARK_HOME=/path/to/spark/1.6.3 $ bin/mahout spark-shell ``` In the shell... ```scala scala> :load examples/bin/SparseSparseDrmTimer.mscala scala> timeSparseDRMMMul(200, 200, 200, 2, .2, 1234L) ``` ^^ Should run with out error... Ctrl+C to close. B) Spark 2.0.2 ```sh $ mvn clean package -DskipTests -Dspark.version=2.0.2 $ export SPARK_HOME=/path/to/spark/2.0.2 $ bin/mahout spark-shell ``` In the shell... ```scala scala> :load examples/bin/SparseSparseDrmTimer.mscala scala> timeSparseDRMMMul(200, 200, 200, 2, .2, 1234L) ``` ^^ Should run with out error... Ctrl+C to close. C) Spark 2.1.0 ```sh $ mvn clean package -DskipTests -Dspark.version=2.1.0 $ export SPARK_HOME=/path/to/spark/2.1.0 $ bin/mahout spark-shell ``` In the shell... ```scala scala> :load examples/bin/SparseSparseDrmTimer.mscala scala> timeSparseDRMMMul(200, 200, 200, 2, .2, 1234L) ``` ^^ Should run with out error... Ctrl+C to close.
          Hide
          kanjilal Saikat Kanjilal added a comment -

          Trevor Grant I've already done all this without any errors, is there other testing I can help with on this?

          Show
          kanjilal Saikat Kanjilal added a comment - Trevor Grant I've already done all this without any errors, is there other testing I can help with on this?
          Hide
          weienran Andrew Weienr added a comment - - edited

          I followed the instructions from Trevor Grant here: https://issues.apache.org/jira/browse/MAHOUT-1894?focusedCommentId=15871928&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-15871928

          Was able to run the scala shell against all 3 versions of Spark without errors.
          My system:
          OS X El Capitan 10.11.6 (15G1217)
          Maven 3.3.9
          Java 1.8.0_101

          One other note. Where the instructions say "$ bin mahout spark-shell" they should actually say
          "$ bin/mahout spark-shell" (just in case there are any newbies helping test)

          Show
          weienran Andrew Weienr added a comment - - edited I followed the instructions from Trevor Grant here: https://issues.apache.org/jira/browse/MAHOUT-1894?focusedCommentId=15871928&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-15871928 Was able to run the scala shell against all 3 versions of Spark without errors. My system: OS X El Capitan 10.11.6 (15G1217) Maven 3.3.9 Java 1.8.0_101 One other note. Where the instructions say "$ bin mahout spark-shell" they should actually say "$ bin/mahout spark-shell" (just in case there are any newbies helping test)
          Hide
          kanjilal Saikat Kanjilal added a comment - - edited

          Trevor Grant I think we should author a set of mscala scripts that will: 1) certify each build versus various spark backends 2) spit out a report that outlines typical SLA's from various operations before and after upgrading to a new version of a tech stack 3) adds any zeppelin visualizations to the report as necessary. I was wondering if we should come up with a candidate list of expensive operations that we want to test based on the , any ideas what these would be in the samsara world, maybe we can have a perf test suite that iterates over a set of algorithms in samsara for this. The SparseSparseDrmTimer can be the starting point for this.

          Thoughts?

          Show
          kanjilal Saikat Kanjilal added a comment - - edited Trevor Grant I think we should author a set of mscala scripts that will: 1) certify each build versus various spark backends 2) spit out a report that outlines typical SLA's from various operations before and after upgrading to a new version of a tech stack 3) adds any zeppelin visualizations to the report as necessary. I was wondering if we should come up with a candidate list of expensive operations that we want to test based on the , any ideas what these would be in the samsara world, maybe we can have a perf test suite that iterates over a set of algorithms in samsara for this. The SparseSparseDrmTimer can be the starting point for this. Thoughts?
          Hide
          andrew.musselman Andrew Musselman added a comment -

          I'm getting "That file does not exist" after the shell opens with spark 1.6.3.

          Show
          andrew.musselman Andrew Musselman added a comment - I'm getting "That file does not exist" after the shell opens with spark 1.6.3.
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user andrewmusselman commented on the issue:

          https://github.com/apache/mahout/pull/271

          Yeah, I'm getting a result for all three versions of spark, but the welcome banner situation could use some work; I'd like to remove the "This file does not exist" message, and with 1.6.3 the spark banner shows up before the mahout banner, while with 2.x the mahout banner shows up first. Perhaps suppressing the mahout banner makes sense.

          Show
          githubbot ASF GitHub Bot added a comment - Github user andrewmusselman commented on the issue: https://github.com/apache/mahout/pull/271 Yeah, I'm getting a result for all three versions of spark, but the welcome banner situation could use some work; I'd like to remove the "This file does not exist" message, and with 1.6.3 the spark banner shows up before the mahout banner, while with 2.x the mahout banner shows up first. Perhaps suppressing the mahout banner makes sense.
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user asfgit closed the pull request at:

          https://github.com/apache/mahout/pull/271

          Show
          githubbot ASF GitHub Bot added a comment - Github user asfgit closed the pull request at: https://github.com/apache/mahout/pull/271
          Hide
          hudson Hudson added a comment -

          FAILURE: Integrated in Jenkins build Mahout-Quality #3419 (See https://builds.apache.org/job/Mahout-Quality/3419/)
          MAHOUT-1894 Add Support for Spark 2.x closes apache/mahout#271 (rawkintrevo: rev 5afdc68e0a25e9f66a0d707a7f76d46d9603b614)

          • (delete) spark-shell/src/main/scala/org/apache/mahout/sparkbindings/shell/Main.scala
          • (add) bin/load-shell.scala
          • (edit) pom.xml
          • (delete) spark-shell/src/main/scala/org/apache/mahout/sparkbindings/shell/MahoutSparkILoop.scala
          • (edit) bin/mahout
          • (edit) distribution/pom.xml
          • (delete) spark-shell/pom.xml
          • (delete) spark-shell/src/test/mahout/simple.mscala
          • (edit) distribution/src/main/assembly/bin.xml
          Show
          hudson Hudson added a comment - FAILURE: Integrated in Jenkins build Mahout-Quality #3419 (See https://builds.apache.org/job/Mahout-Quality/3419/ ) MAHOUT-1894 Add Support for Spark 2.x closes apache/mahout#271 (rawkintrevo: rev 5afdc68e0a25e9f66a0d707a7f76d46d9603b614) (delete) spark-shell/src/main/scala/org/apache/mahout/sparkbindings/shell/Main.scala (add) bin/load-shell.scala (edit) pom.xml (delete) spark-shell/src/main/scala/org/apache/mahout/sparkbindings/shell/MahoutSparkILoop.scala (edit) bin/mahout (edit) distribution/pom.xml (delete) spark-shell/pom.xml (delete) spark-shell/src/test/mahout/simple.mscala (edit) distribution/src/main/assembly/bin.xml

            People

            • Assignee:
              rawkintrevo Trevor Grant
              Reporter:
              smarthi Suneel Marthi
            • Votes:
              1 Vote for this issue
              Watchers:
              8 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development

                  Agile