Uploaded image for project: 'Mahout'
  1. Mahout
  2. MAHOUT-1489

Interactive Scala & Spark Bindings Shell & Script processor

    Details

    • Type: New Feature
    • Status: Closed
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: 1.0.0
    • Fix Version/s: 0.10.0
    • Component/s: None
    • Labels:
      None

      Description

      Build an interactive shell /scripting (just like spark shell). Something very similar in R interactive/script runner mode.

      1. MAHOUT-1489.patch
        17 kB
        Dmitriy Lyubimov
      2. MAHOUT-1489.patch.1
        29 kB
        Dmitriy Lyubimov
      3. mahout-spark-shell-running-standalone.png
        16 kB
        Dmitriy Lyubimov

        Activity

        Hide
        dlyubimov Dmitriy Lyubimov added a comment -

        I cannot assign to a non-committer, so i will be watching it with assumption the patch is coming from Saikat. (that was condition of creating a new Jira).

        Show
        dlyubimov Dmitriy Lyubimov added a comment - I cannot assign to a non-committer, so i will be watching it with assumption the patch is coming from Saikat. (that was condition of creating a new Jira).
        Hide
        kanjilal Saikat Kanjilal added a comment -

        Yes correct, however for my sake please feel free to add as much description as possible

        Show
        kanjilal Saikat Kanjilal added a comment - Yes correct, however for my sake please feel free to add as much description as possible
        Hide
        kanjilal Saikat Kanjilal added a comment -

        Here is an initial list of functionality that I think can exist in the shell:

        1) Ability to execute against a local or remote spark cluster
        2) Create parallelized collections based on an existing scala collection
        3) Create a distributed dataset from a remote or local hadoop data set
        4) A subset of transformations and actions as listed in the following link (http://spark.incubator.apache.org/docs/0.8.1/scala-programming-guide.html)

        Love to hear your feedback.

        Show
        kanjilal Saikat Kanjilal added a comment - Here is an initial list of functionality that I think can exist in the shell: 1) Ability to execute against a local or remote spark cluster 2) Create parallelized collections based on an existing scala collection 3) Create a distributed dataset from a remote or local hadoop data set 4) A subset of transformations and actions as listed in the following link ( http://spark.incubator.apache.org/docs/0.8.1/scala-programming-guide.html ) Love to hear your feedback.
        Hide
        tdunning Ted Dunning added a comment -

        I think that creating a distributed object from an in-memory local matrix would be good as well.

        Show
        tdunning Ted Dunning added a comment - I think that creating a distributed object from an in-memory local matrix would be good as well.
        Hide
        kanjilal Saikat Kanjilal added a comment -

        Were you thinking of an in memory 2d array or something more elegant ?

        Show
        kanjilal Saikat Kanjilal added a comment - Were you thinking of an in memory 2d array or something more elegant ?
        Hide
        dlieu.7@gmail.com Dmitriy Lyubimov added a comment -

        This issues not about it and this is already supported.

        Show
        dlieu.7@gmail.com Dmitriy Lyubimov added a comment - This issues not about it and this is already supported.
        Hide
        dlieu.7@gmail.com Dmitriy Lyubimov added a comment -

        yes

        no this is not the scope

        no this is not the scope

        no this is not the scope

        This is purely engineering task, nothing fancy, no new functionality, just
        the shell with proper packages pre-imported. I've already done this several
        times without Spark specifics though

        Implementor needs to famialize himself with the topics:

        (1) scala tools, in particular, scala shell API
        (2) Spark modifications and additions to the original shell – will just
        need to take a rip of spark shell and add proper imports in the scope per
        document of Scala Bindings.

        Show
        dlieu.7@gmail.com Dmitriy Lyubimov added a comment - yes no this is not the scope no this is not the scope no this is not the scope This is purely engineering task, nothing fancy, no new functionality, just the shell with proper packages pre-imported. I've already done this several times without Spark specifics though Implementor needs to famialize himself with the topics: (1) scala tools, in particular, scala shell API (2) Spark modifications and additions to the original shell – will just need to take a rip of spark shell and add proper imports in the scope per document of Scala Bindings.
        Hide
        dlyubimov Dmitriy Lyubimov added a comment -

        hm email quoting did not work . i guess i should have converted to simple text to retain quoting.

        Show
        dlyubimov Dmitriy Lyubimov added a comment - hm email quoting did not work . i guess i should have converted to simple text to retain quoting.
        Hide
        dlieu.7@gmail.com Dmitriy Lyubimov added a comment -

        yes

        no this is not the scope

        no this is not the scope

        no this is not the scope

        This is purely engineering task, nothing fancy, no new functionality,
        just the shell with proper packages pre-imported. I've already done
        this several times without Spark specifics though

        Implementor needs to famialize himself with the topics:

        (1) scala tools, in particular, scala shell API
        (2) Spark modifications and additions to the original shell – will
        just need to take a rip of spark shell and add proper imports in the
        scope per document of Scala Bindings.

        Show
        dlieu.7@gmail.com Dmitriy Lyubimov added a comment - yes no this is not the scope no this is not the scope no this is not the scope This is purely engineering task, nothing fancy, no new functionality, just the shell with proper packages pre-imported. I've already done this several times without Spark specifics though Implementor needs to famialize himself with the topics: (1) scala tools, in particular, scala shell API (2) Spark modifications and additions to the original shell – will just need to take a rip of spark shell and add proper imports in the scope per document of Scala Bindings.
        Hide
        dlyubimov Dmitriy Lyubimov added a comment -

        1) Ability to execute against a local or remote spark cluster

        yes

        2) Create parallelized collections based on an existing scala collection

        no this is not the scope

        3) Create a distributed dataset from a remote or local hadoop data set

        no this is not the scope

        4) A subset of transformations and actions as listed in the following link (http://spark.incubator.apache.org/docs/0.8.1/scala-programming-guide.html)

        no this is not the scope

        This is purely engineering task, nothing fancy, no new functionality,
        just the shell with proper packages pre-imported. I've already done
        this several times without Spark specifics though

        Implementor needs to famialize himself with the topics:
        (1) scala tools, in particular, scala shell API
        (2) Spark modifications and additions to the original shell – will
        just need to take a rip of spark shell and add proper imports in the
        scope per document of Scala Bindings.

        Show
        dlyubimov Dmitriy Lyubimov added a comment - 1) Ability to execute against a local or remote spark cluster yes 2) Create parallelized collections based on an existing scala collection no this is not the scope 3) Create a distributed dataset from a remote or local hadoop data set no this is not the scope 4) A subset of transformations and actions as listed in the following link ( http://spark.incubator.apache.org/docs/0.8.1/scala-programming-guide.html ) no this is not the scope This is purely engineering task, nothing fancy, no new functionality, just the shell with proper packages pre-imported. I've already done this several times without Spark specifics though Implementor needs to famialize himself with the topics: (1) scala tools, in particular, scala shell API (2) Spark modifications and additions to the original shell – will just need to take a rip of spark shell and add proper imports in the scope per document of Scala Bindings.
        Hide
        dlyubimov Dmitriy Lyubimov added a comment -
        Show
        dlyubimov Dmitriy Lyubimov added a comment - See https://github.com/apache/incubator-spark/blob/master/repl/src/main/scala/org/apache/spark/repl/Main.scala (main class for Spark shell that starts loop over Scala shell).
        Hide
        dlyubimov Dmitriy Lyubimov added a comment -

        Hm. they actually copy-and-hack the original scala shell (rather than reusing it). It sounds like a lot of work that way.

        It is probably because they have to compile closures code and redistribute it to the back as the user creates them interactively. Bummer, it looks much more complicated than i hoped it would be.

        Still though, we probably could extend a few things here – all we need to do, in reality, is to add a few imports. It would be Spark-specific that way, but oh well. We need to start somewhere.

        I probably need to think about it for a bit too.

        Show
        dlyubimov Dmitriy Lyubimov added a comment - Hm. they actually copy-and-hack the original scala shell (rather than reusing it). It sounds like a lot of work that way. It is probably because they have to compile closures code and redistribute it to the back as the user creates them interactively. Bummer, it looks much more complicated than i hoped it would be. Still though, we probably could extend a few things here – all we need to do, in reality, is to add a few imports. It would be Spark-specific that way, but oh well. We need to start somewhere. I probably need to think about it for a bit too.
        Hide
        kanjilal Saikat Kanjilal added a comment - - edited

        I would vote to take the original scala shell code and create a derived extension/class that extends/inherits the scala shell capability and adds in the spark shell imports, kind of like a decorator in the design pattern world, what do you think ?

        Show
        kanjilal Saikat Kanjilal added a comment - - edited I would vote to take the original scala shell code and create a derived extension/class that extends/inherits the scala shell capability and adds in the spark shell imports, kind of like a decorator in the design pattern world, what do you think ?
        Hide
        kanjilal Saikat Kanjilal added a comment -

        Initial github repo:

        https://github.com/skanjila/scala-spark-shell

        I've merged the mahout spark code and the code inside the incubator-spark that deals with the shell into one project, I will now start adding implementations for the scala APIs for the shell to sit alongside the spark shell. Dmitry please let me know if you have any more comments or feedback at this point on next steps for things to think about on top of adding the scala shell APIs.

        Show
        kanjilal Saikat Kanjilal added a comment - Initial github repo: https://github.com/skanjila/scala-spark-shell I've merged the mahout spark code and the code inside the incubator-spark that deals with the shell into one project, I will now start adding implementations for the scala APIs for the shell to sit alongside the spark shell. Dmitry please let me know if you have any more comments or feedback at this point on next steps for things to think about on top of adding the scala shell APIs.
        Hide
        dlieu.7@gmail.com Dmitriy Lyubimov added a comment -

        I think it is a good start.
        (1) We probably need it on a fork of Mahout, not as a standalone project.
        (2) Are you sure we can't just inherit from spark shell, rather than
        just copy-and-hack?
        (3) we obviously need it to load with "mahout spark-shell" command
        which would involve some hacking of mahout bash script to figure out
        spark binaries inside SPARK_HOME etc.

        Show
        dlieu.7@gmail.com Dmitriy Lyubimov added a comment - I think it is a good start. (1) We probably need it on a fork of Mahout, not as a standalone project. (2) Are you sure we can't just inherit from spark shell, rather than just copy-and-hack? (3) we obviously need it to load with "mahout spark-shell" command which would involve some hacking of mahout bash script to figure out spark binaries inside SPARK_HOME etc.
        Hide
        kanjilal Saikat Kanjilal added a comment -

        Answers embedded:

        (1) We probably need it on a fork of Mahout, not as a standalone project.
        I'll do that and send you the link

        (2) Are you sure we can't just inherit from spark shell, rather than
        just copy-and-hack?
        I'll look into either creating a child class that acts as a decorator which in turn is composed of the spark shell and the scala shell combined together or just inheriting directly from the spark shell

        (3) we obviously need it to load with "mahout spark-shell" command
        which would involve some hacking of mahout bash script to figure out
        spark binaries inside SPARK_HOME etc.

        Not too worried about this part, I wont get there for a bit but yes point taken, anything else, I was wondering whether there are pieces of the scala shell API that we should put on much lower priority than other features, I'll come up with a list of features that I think we should implement and add them to this JIRA.

        Thanks

        Show
        kanjilal Saikat Kanjilal added a comment - Answers embedded: (1) We probably need it on a fork of Mahout, not as a standalone project. I'll do that and send you the link (2) Are you sure we can't just inherit from spark shell, rather than just copy-and-hack? I'll look into either creating a child class that acts as a decorator which in turn is composed of the spark shell and the scala shell combined together or just inheriting directly from the spark shell (3) we obviously need it to load with "mahout spark-shell" command which would involve some hacking of mahout bash script to figure out spark binaries inside SPARK_HOME etc. Not too worried about this part, I wont get there for a bit but yes point taken, anything else, I was wondering whether there are pieces of the scala shell API that we should put on much lower priority than other features, I'll come up with a list of features that I think we should implement and add them to this JIRA. Thanks
        Hide
        dlieu.7@gmail.com Dmitriy Lyubimov added a comment -

        yeah.
        reallistically, functionality-wise it is not that much we need to add
        here. It is basic Spark shell +

        (1) with Mahout classpath of mahout-spark and its transitives added in
        addition to Spark stuff;
        (2) importing our standard things automatically (i.e.
        o.a.m.sparkbidings., o.a.m.sparkbindings.drm., RLikeDrmOps._ etc per
        manual – make that default package imports easily to add to as we add
        e.g. data frames dsl).

        This is not that much, no fundamental hacks are required. In fact, i
        have done (2)-like things a lot with standard scala interpreter. In
        our case we of course cannot use standard scala itnterpreter because
        we need Spark to sync whatever new closures we put into script, with
        the backend, for us. But we probably can just inherit from Spark
        interpreter and then modify its automatic imports. The classpath
        issues shuold be handled by mahout.sh script.

        Show
        dlieu.7@gmail.com Dmitriy Lyubimov added a comment - yeah. reallistically, functionality-wise it is not that much we need to add here. It is basic Spark shell + (1) with Mahout classpath of mahout-spark and its transitives added in addition to Spark stuff; (2) importing our standard things automatically (i.e. o.a.m.sparkbidings. , o.a.m.sparkbindings.drm. , RLikeDrmOps._ etc per manual – make that default package imports easily to add to as we add e.g. data frames dsl). This is not that much, no fundamental hacks are required. In fact, i have done (2)-like things a lot with standard scala interpreter. In our case we of course cannot use standard scala itnterpreter because we need Spark to sync whatever new closures we put into script, with the backend, for us. But we probably can just inherit from Spark interpreter and then modify its automatic imports. The classpath issues shuold be handled by mahout.sh script.
        Hide
        kanjilal Saikat Kanjilal added a comment -

        Dmitry,
        I've gone ahead and forked mahout and added in the contents of the initial standalone github repo, result is here:

        https://github.com/skanjila/mahout-scala-spark-shell

        I plan to start moving forward with implementing steps 1 and 2 above.

        Show
        kanjilal Saikat Kanjilal added a comment - Dmitry, I've gone ahead and forked mahout and added in the contents of the initial standalone github repo, result is here: https://github.com/skanjila/mahout-scala-spark-shell I plan to start moving forward with implementing steps 1 and 2 above.
        Hide
        kanjilal Saikat Kanjilal added a comment - - edited

        Ok more progress, so here's what I've done so far and need a bit of brainstorming/guidanc, have checked in all changes to git hub repo specified above, although code still not building yet, regardless I've:
        1) merged code from spark onto the shell package under org.apache.mahout.sparkbindings
        2) replaced all the classes under number 1 to have the correct package structure
        3) added two new maven dependencies around the scala compiler and jlang both using 2.10.0 to resolve some errors associated with scala.tools and scala.***.jline etc

        Now I noticed that the spark code brings in an HTTPServer which is associated with running jetty locally, is this something we want inside our spark shell, if not I'll need to refactor a bunch of code to remove these dependencies and understand deeper what or how to replace this, the errors we are getting are associated with this

        Dmitry would love some guiodance/discussion on how to proceed around the HTTPServer

        Thanks in advance.

        Show
        kanjilal Saikat Kanjilal added a comment - - edited Ok more progress, so here's what I've done so far and need a bit of brainstorming/guidanc, have checked in all changes to git hub repo specified above, although code still not building yet, regardless I've: 1) merged code from spark onto the shell package under org.apache.mahout.sparkbindings 2) replaced all the classes under number 1 to have the correct package structure 3) added two new maven dependencies around the scala compiler and jlang both using 2.10.0 to resolve some errors associated with scala.tools and scala.***.jline etc Now I noticed that the spark code brings in an HTTPServer which is associated with running jetty locally, is this something we want inside our spark shell, if not I'll need to refactor a bunch of code to remove these dependencies and understand deeper what or how to replace this, the errors we are getting are associated with this Dmitry would love some guiodance/discussion on how to proceed around the HTTPServer Thanks in advance.
        Hide
        dlyubimov Dmitriy Lyubimov added a comment -

        i would say if it brings extra dependencies compared to transitive those of Spark-core, sure, it needs a separate mvn module. Note that in Spark the shell is also a standalone maven artifact. I would expect it to bring in at least scala-tools dependendency which normally is not needed by pure scala programs. So chances are high it needs a standalone maven module (say, "shell" module, "mahout-shell" for the artifact id).

        Show
        dlyubimov Dmitriy Lyubimov added a comment - i would say if it brings extra dependencies compared to transitive those of Spark-core, sure, it needs a separate mvn module. Note that in Spark the shell is also a standalone maven artifact. I would expect it to bring in at least scala-tools dependendency which normally is not needed by pure scala programs. So chances are high it needs a standalone maven module (say, "shell" module, "mahout-shell" for the artifact id).
        Hide
        kanjilal Saikat Kanjilal added a comment -

        Got i thanks, I'll create a stand-alone shell module and bring in all the needed dependencies, so I think that should live parallel to the spark directory in the source tree, we can call it mahout-shell , sound good?

        Show
        kanjilal Saikat Kanjilal added a comment - Got i thanks, I'll create a stand-alone shell module and bring in all the needed dependencies, so I think that should live parallel to the spark directory in the source tree, we can call it mahout-shell , sound good?
        Hide
        kanjilal Saikat Kanjilal added a comment -

        Ok lots of changes here so far:

        1) added a new maven project called shell
        2) created a new package under org/apache/mahout called shell which contains the spark shell code and all its dependencies (sub packages include server/storage/ui and util), we may not need these but for now my goal is to just get the project compiling
        3) currently battling through a bunch of compilation errors with classes not being found which I'll be bringing in as needed (100+errors just related to this)

        github repo is here again for reference: https://github.com/skanjila/mahout-scala-spark-shell

        Dmitry my goal is to get this compiling with all dependencies brought in from spark which we can remove later

        Show
        kanjilal Saikat Kanjilal added a comment - Ok lots of changes here so far: 1) added a new maven project called shell 2) created a new package under org/apache/mahout called shell which contains the spark shell code and all its dependencies (sub packages include server/storage/ui and util), we may not need these but for now my goal is to just get the project compiling 3) currently battling through a bunch of compilation errors with classes not being found which I'll be bringing in as needed (100+errors just related to this) github repo is here again for reference: https://github.com/skanjila/mahout-scala-spark-shell Dmitry my goal is to get this compiling with all dependencies brought in from spark which we can remove later
        Hide
        kanjilal Saikat Kanjilal added a comment -

        Finally have something compiling:
        1) Removed all the un-necessary packages around storage/scheduler and more
        2) Trimmed down the code to copy the contents of the spark-shell for now since extending an object is not possible in scala, one alternative is to create an abstract trait that we could inherit from, need to talk this through as I want to reuse the spark code as much as possible
        3) Moved the ReplSuite into a set of tests called MahoutShellSuite which we'll leverage
        4) Added dependencies for various spark components into pom file

        Dmitry can you take a look at https://github.com/skanjila/mahout-scala-spark-shell/shell and let me know your thoughts on : 1) whether any obvious dependencies are missing in the pom file 2) whether the initial shell code is missing any pieces

        If not I'll be moving forward and fixing the tests to work with the code for the next big code drop

        Dmitry

        Show
        kanjilal Saikat Kanjilal added a comment - Finally have something compiling: 1) Removed all the un-necessary packages around storage/scheduler and more 2) Trimmed down the code to copy the contents of the spark-shell for now since extending an object is not possible in scala, one alternative is to create an abstract trait that we could inherit from, need to talk this through as I want to reuse the spark code as much as possible 3) Moved the ReplSuite into a set of tests called MahoutShellSuite which we'll leverage 4) Added dependencies for various spark components into pom file Dmitry can you take a look at https://github.com/skanjila/mahout-scala-spark-shell/shell and let me know your thoughts on : 1) whether any obvious dependencies are missing in the pom file 2) whether the initial shell code is missing any pieces If not I'll be moving forward and fixing the tests to work with the code for the next big code drop Dmitry
        Hide
        kanjilal Saikat Kanjilal added a comment -

        Got a bit further, unit tests are at least compiling, they run but fail, will tackle this next

        Show
        kanjilal Saikat Kanjilal added a comment - Got a bit further, unit tests are at least compiling, they run but fail, will tackle this next
        Hide
        dlyubimov Dmitriy Lyubimov added a comment -

        So i take it you do extend original Spark shell. Nice!

        Any reason why you did not just include dependency on mahout-spark and use mahoutContext() method from there instead of copying over my code?

        Show
        dlyubimov Dmitriy Lyubimov added a comment - So i take it you do extend original Spark shell. Nice! Any reason why you did not just include dependency on mahout-spark and use mahoutContext() method from there instead of copying over my code?
        Hide
        dlyubimov Dmitriy Lyubimov added a comment -

        This of course will need a bit of clean up, but this a good skeleton. nice.

        Show
        dlyubimov Dmitriy Lyubimov added a comment - This of course will need a bit of clean up, but this a good skeleton. nice.
        Hide
        kanjilal Saikat Kanjilal added a comment -

        1) Added MahoutLocalContext into MahoutShellSuite
        2) Added the code and test sparkbindings targets inside pom.xml

        Currently tests are compiling and failing, will fix the tests next.

        Show
        kanjilal Saikat Kanjilal added a comment - 1) Added MahoutLocalContext into MahoutShellSuite 2) Added the code and test sparkbindings targets inside pom.xml Currently tests are compiling and failing, will fix the tests next.
        Hide
        kanjilal Saikat Kanjilal added a comment -

        ok home stretch:

        1) added the scala.reflect dependencies that the MahoutShellSuite requires
        2) brought over the computeClasspath.sh script from the spark incubator project
        3) Tests are now at least running and are failing because of dependencies brought on by number 2

        Dmitry need to figure out what modifications need to be made to number 2, a question for you here do we in the mahout-spark project already have a dependency similar to this, if not which parts of this script should we remove and which parts should be kept, some insight on this would be very helpful

        Thanks

        Show
        kanjilal Saikat Kanjilal added a comment - ok home stretch: 1) added the scala.reflect dependencies that the MahoutShellSuite requires 2) brought over the computeClasspath.sh script from the spark incubator project 3) Tests are now at least running and are failing because of dependencies brought on by number 2 Dmitry need to figure out what modifications need to be made to number 2, a question for you here do we in the mahout-spark project already have a dependency similar to this, if not which parts of this script should we remove and which parts should be kept, some insight on this would be very helpful Thanks
        Hide
        dlyubimov Dmitriy Lyubimov added a comment -

        honestly, this sounds overdesign-y. Bringing in Spark scripts should not be necessary.

        I think i need to play with it but I am sure there's a way to figure spark classpath similar to how mahout context figures Mahout's jars.

        i also definitely do not understand all the troubles with transitive dependencies, maven should bring all that is needed automatically with minimum adjustments .

        Show
        dlyubimov Dmitriy Lyubimov added a comment - honestly, this sounds overdesign-y. Bringing in Spark scripts should not be necessary. I think i need to play with it but I am sure there's a way to figure spark classpath similar to how mahout context figures Mahout's jars. i also definitely do not understand all the troubles with transitive dependencies, maven should bring all that is needed automatically with minimum adjustments .
        Hide
        kanjilal Saikat Kanjilal added a comment -

        The requirement to bring in scala.reflect stemmed from a runtime error that indicated that this package was missing which I fixed by adding this dependency. Let me know how you want to proceed with the shell script, if MahoutContext is already loading all the jars correctly then I think the right approach would be to fix the MahoutShellSuite to load all the jars using the MahoutContext.

        Show
        kanjilal Saikat Kanjilal added a comment - The requirement to bring in scala.reflect stemmed from a runtime error that indicated that this package was missing which I fixed by adding this dependency. Let me know how you want to proceed with the shell script, if MahoutContext is already loading all the jars correctly then I think the right approach would be to fix the MahoutShellSuite to load all the jars using the MahoutContext.
        Hide
        kanjilal Saikat Kanjilal added a comment -

        Here's what I see when I run unitTests:

        [INFO] — scalatest-maven-plugin:1.0-M2:test (test) @ mahout-shell —
        WARNING: -p has been deprecated and will be reused for a different (but still very cool) purpose in ScalaTest 2.0. Please change all uses of -p to -R.
        Discovery starting.
        Discovery completed in 632 milliseconds.
        Run starting. Expected test count is: 9
        DiscoverySuite:
        MahoutShellSuite:
        2014-04-09 23:35:51.462 java[583:b07] Unable to load realm info from SCDynamicStore

        ls: /Users/skanjila/code/java/mahout-scala-spark-shell/shell/assembly/target/scala-2.10.3/spark-assembly*hadoop*.jar: No such file or directory
        ls: /Users/skanjila/code/java/mahout-scala-spark-shell/shell/assembly/target/scala-2.10.3/spark-assembly*hadoop*.jar: No such file or directory
        ls: /Users/skanjila/code/java/mahout-scala-spark-shell/shell/assembly/target/scala-2.10.3/spark-assembly*hadoop*.jar: No such file or directory
        ls: /Users/skanjila/code/java/mahout-scala-spark-shell/shell/assembly/target/scala-2.10.3/spark-assembly*hadoop*.jar: No such file or directory
        ls: /Users/skanjila/code/java/mahout-scala-spark-shell/shell/assembly/target/scala-2.10.3/spark-assembly*hadoop*.jar: No such file or directory
        ls: /Users/skanjila/code/java/mahout-scala-spark-shell/shell/assembly/target/scala-2.10.3/spark-assembly*hadoop*.jar: No such file or directory
        ls: /Users/skanjila/code/java/mahout-scala-spark-shell/shell/assembly/target/scala-2.10.3/spark-assembly*hadoop*.jar: No such file or directory
        ls: /Users/skanjila/code/java/mahout-scala-spark-shell/shell/assembly/target/scala-2.10.3/spark-assembly*hadoop*.jar: No such file or directory
        ls: /Users/skanjila/code/java/mahout-scala-spark-shell/shell/assembly/target/scala-2.10.3/spark-assembly*hadoop*.jar: No such file or directory
        ls: /Users/skanjila/code/java/mahout-scala-spark-shell/shell/assembly/target/scala-2.10.3/spark-assembly*hadoop*.jar: No such file or directory
        0 [sparkMaster-akka.actor.default-dispatcher-4] ERROR org.apache.spark.deploy.master.Master - Application Spark shell with ID app-20140409233702-0000 failed 10 times, removing it
        21 [spark-akka.actor.default-dispatcher-2] ERROR org.apache.spark.deploy.client.AppClient$ClientActor - Master removed our application: FAILED; stopping client

        My questions:
        1) Given that I have repurposed the spark-repl unit tests, I am wondering whether we (as in the mahout spark shell) should have the same requirements for data store as in running or needing the spark assembly hadoop jar files which essentially means that a local install of spark is needed
        2) When I run the unit tests now I see something like this:
        - propagation of local properties
        - simple foreach with accumulator
        - external vars
        - external classes
        - external functions
        - external functions that access vars
        - broadcast vars
        - interacting with files

        I'm assuming this means its bypassing all the unit tests, I'll investigate this further

        3) Earlier you mentioned the mahout.sh script, should we merge the contents of this script with the one I have above and place that in the bin sub-directory, or more importantly I need to understand how mahout.sh is related to computeClasspath.sh

        Eager to hear your thoughts to proceed quickly with next steps

        Show
        kanjilal Saikat Kanjilal added a comment - Here's what I see when I run unitTests: [INFO] — scalatest-maven-plugin:1.0-M2:test (test) @ mahout-shell — WARNING: -p has been deprecated and will be reused for a different (but still very cool) purpose in ScalaTest 2.0. Please change all uses of -p to -R. Discovery starting. Discovery completed in 632 milliseconds. Run starting. Expected test count is: 9 DiscoverySuite: MahoutShellSuite: 2014-04-09 23:35:51.462 java [583:b07] Unable to load realm info from SCDynamicStore ls: /Users/skanjila/code/java/mahout-scala-spark-shell/shell/assembly/target/scala-2.10.3/spark-assembly*hadoop*.jar: No such file or directory ls: /Users/skanjila/code/java/mahout-scala-spark-shell/shell/assembly/target/scala-2.10.3/spark-assembly*hadoop*.jar: No such file or directory ls: /Users/skanjila/code/java/mahout-scala-spark-shell/shell/assembly/target/scala-2.10.3/spark-assembly*hadoop*.jar: No such file or directory ls: /Users/skanjila/code/java/mahout-scala-spark-shell/shell/assembly/target/scala-2.10.3/spark-assembly*hadoop*.jar: No such file or directory ls: /Users/skanjila/code/java/mahout-scala-spark-shell/shell/assembly/target/scala-2.10.3/spark-assembly*hadoop*.jar: No such file or directory ls: /Users/skanjila/code/java/mahout-scala-spark-shell/shell/assembly/target/scala-2.10.3/spark-assembly*hadoop*.jar: No such file or directory ls: /Users/skanjila/code/java/mahout-scala-spark-shell/shell/assembly/target/scala-2.10.3/spark-assembly*hadoop*.jar: No such file or directory ls: /Users/skanjila/code/java/mahout-scala-spark-shell/shell/assembly/target/scala-2.10.3/spark-assembly*hadoop*.jar: No such file or directory ls: /Users/skanjila/code/java/mahout-scala-spark-shell/shell/assembly/target/scala-2.10.3/spark-assembly*hadoop*.jar: No such file or directory ls: /Users/skanjila/code/java/mahout-scala-spark-shell/shell/assembly/target/scala-2.10.3/spark-assembly*hadoop*.jar: No such file or directory 0 [sparkMaster-akka.actor.default-dispatcher-4] ERROR org.apache.spark.deploy.master.Master - Application Spark shell with ID app-20140409233702-0000 failed 10 times, removing it 21 [spark-akka.actor.default-dispatcher-2] ERROR org.apache.spark.deploy.client.AppClient$ClientActor - Master removed our application: FAILED; stopping client My questions: 1) Given that I have repurposed the spark-repl unit tests, I am wondering whether we (as in the mahout spark shell) should have the same requirements for data store as in running or needing the spark assembly hadoop jar files which essentially means that a local install of spark is needed 2) When I run the unit tests now I see something like this: - propagation of local properties - simple foreach with accumulator - external vars - external classes - external functions - external functions that access vars - broadcast vars - interacting with files I'm assuming this means its bypassing all the unit tests, I'll investigate this further 3) Earlier you mentioned the mahout.sh script, should we merge the contents of this script with the one I have above and place that in the bin sub-directory, or more importantly I need to understand how mahout.sh is related to computeClasspath.sh Eager to hear your thoughts to proceed quickly with next steps
        Hide
        dlyubimov Dmitriy Lyubimov added a comment -

        Ok. this is way too complicated for me. I did a quick hack [1] which seems to work at least in local mode . Works like a charm. Here is the session dump (need just to start o.a.m.sparkbindings.shell.Main class from idea). I also filtered out most of spark debug messages that are enabled by default:

        "Mahout Spark Shell session"
        14/04/10 17:52:27 INFO spark.HttpServer: Starting HTTP Server
        14/04/10 17:52:27 INFO server.Server: jetty-7.6.8.v20121106
        14/04/10 17:52:27 INFO server.AbstractConnector: Started SocketConnector@0.0.0.0:60204
        Welcome to
              ____              __
             / __/__  ___ _____/ /__
            _\ \/ _ \/ _ `/ __/  '_/
           /___/ .__/\_,_/_/ /_/\_\   version 0.9.0
              /_/
        
        Using Scala version 2.10.3 (Java HotSpot(TM) 64-Bit Server VM, Java 1.7.0_51)
        Type in expressions to have them evaluated.
        Type :help for more information.
        Created spark context..
        Spark context available as sc.
        
        scala> val a = dense((1,2,3),(3,4,5))
        a: org.apache.mahout.math.DenseMatrix = 
        {
          0  =>	{0:1.0,1:2.0,2:3.0}
          1  =>	{0:3.0,1:4.0,2:5.0}
        }
        
        scala> val drmA = drmParallelize(a)
        drmA: org.apache.mahout.sparkbindings.drm.CheckpointedDrm[Int] = org.apache.mahout.sparkbindings.drm.CheckpointedDrmBase@791b95a1
        
        scala> val drmAtA = drmA.t %*% drmA
        drmAtA: org.apache.mahout.sparkbindings.drm.DrmLike[Int] = OpAB(OpAt(org.apache.mahout.sparkbindings.drm.CheckpointedDrmBase@791b95a1),org.apache.mahout.sparkbindings.drm.CheckpointedDrmBase@791b95a1)
        
        scala> drmAtA.collect
        res0: org.apache.mahout.math.Matrix = 
        {
          0  =>	{0:10.0,1:14.0,2:18.0}
          1  =>	{0:14.0,1:20.0,2:26.0}
          2  =>	{0:18.0,1:26.0,2:34.0}
        }
        
        scala> 
        

        I suggest you to fork my branch, take it as a basis. It basically works, now the stuff that needs to happen is to verify it in distributed mode and modify mahout shell script to set up proper paths etc. to launch it w.r.t "mahout shell <master>" command. Should be simple an uneventful enough now.

        [1]: https://github.com/dlyubimov/mahout-commits/tree/shell

        Show
        dlyubimov Dmitriy Lyubimov added a comment - Ok. this is way too complicated for me. I did a quick hack [1] which seems to work at least in local mode . Works like a charm. Here is the session dump (need just to start o.a.m.sparkbindings.shell.Main class from idea). I also filtered out most of spark debug messages that are enabled by default: "Mahout Spark Shell session" 14/04/10 17:52:27 INFO spark.HttpServer: Starting HTTP Server 14/04/10 17:52:27 INFO server.Server: jetty-7.6.8.v20121106 14/04/10 17:52:27 INFO server.AbstractConnector: Started SocketConnector@0.0.0.0:60204 Welcome to ____ __ / __/__ ___ _____/ /__ _\ \/ _ \/ _ `/ __/ '_/ /___/ .__/\_,_/_/ /_/\_\ version 0.9.0 /_/ Using Scala version 2.10.3 (Java HotSpot(TM) 64-Bit Server VM, Java 1.7.0_51) Type in expressions to have them evaluated. Type :help for more information. Created spark context.. Spark context available as sc. scala> val a = dense((1,2,3),(3,4,5)) a: org.apache.mahout.math.DenseMatrix = { 0 => {0:1.0,1:2.0,2:3.0} 1 => {0:3.0,1:4.0,2:5.0} } scala> val drmA = drmParallelize(a) drmA: org.apache.mahout.sparkbindings.drm.CheckpointedDrm[Int] = org.apache.mahout.sparkbindings.drm.CheckpointedDrmBase@791b95a1 scala> val drmAtA = drmA.t %*% drmA drmAtA: org.apache.mahout.sparkbindings.drm.DrmLike[Int] = OpAB(OpAt(org.apache.mahout.sparkbindings.drm.CheckpointedDrmBase@791b95a1),org.apache.mahout.sparkbindings.drm.CheckpointedDrmBase@791b95a1) scala> drmAtA.collect res0: org.apache.mahout.math.Matrix = { 0 => {0:10.0,1:14.0,2:18.0} 1 => {0:14.0,1:20.0,2:26.0} 2 => {0:18.0,1:26.0,2:34.0} } scala> I suggest you to fork my branch, take it as a basis. It basically works, now the stuff that needs to happen is to verify it in distributed mode and modify mahout shell script to set up proper paths etc. to launch it w.r.t "mahout shell <master>" command. Should be simple an uneventful enough now. [1]: https://github.com/dlyubimov/mahout-commits/tree/shell
        Hide
        andrew.musselman Andrew Musselman added a comment -

        I was thinking the other day that what I may want out of this is the kind of clear data flow I get when I write Pig.

        For example:

        a = load 'u';
        b = load 'v';
        c = a%.%b
        store c into 'matrix-mult';

        Is this the right thread for that conversation?

        Show
        andrew.musselman Andrew Musselman added a comment - I was thinking the other day that what I may want out of this is the kind of clear data flow I get when I write Pig. For example: a = load 'u'; b = load 'v'; c = a%.%b store c into 'matrix-mult'; Is this the right thread for that conversation?
        Hide
        dlyubimov Dmitriy Lyubimov added a comment -

        Andrew Musselman yes. it does that and more. you really perhaps should listen to one of Matei's Spark talks.

        It is basically a scala shell that aside from line-by-line scala interpretation, turns all closures and classes you create on the fly into byte code and ships them to Spark backend automatically. No other distributed backend that i know is capable of anything comparable. usually they all require one to compile and ship tons of jars in this situation.

        Show
        dlyubimov Dmitriy Lyubimov added a comment - Andrew Musselman yes. it does that and more. you really perhaps should listen to one of Matei's Spark talks. It is basically a scala shell that aside from line-by-line scala interpretation, turns all closures and classes you create on the fly into byte code and ships them to Spark backend automatically. No other distributed backend that i know is capable of anything comparable. usually they all require one to compile and ship tons of jars in this situation.
        Hide
        dlyubimov Dmitriy Lyubimov added a comment -

        PS All we do here is hack it to include mahout libraries, set up proper linalg type serialization, declare implicit spark context and pre-import relevant packages. so at the time you run it, you are already all set.

        This is also very coherent perception-wise with how RScript works.

        Show
        dlyubimov Dmitriy Lyubimov added a comment - PS All we do here is hack it to include mahout libraries, set up proper linalg type serialization, declare implicit spark context and pre-import relevant packages. so at the time you run it, you are already all set. This is also very coherent perception-wise with how RScript works.
        Hide
        andrew.musselman Andrew Musselman added a comment -

        Alright, I will read up.

        Show
        andrew.musselman Andrew Musselman added a comment - Alright, I will read up.
        Hide
        kanjilal Saikat Kanjilal added a comment -

        Dmitry,
        I tried to fork your repo but dont see the spark-shell showing up in my forked version of the repo for some reason, I also tried to clone from your repo and got a "repo not found" error. I'll go ahead and copy the spark-shell directory contents over for now, regardless had a few questions:

        1) When you say mahout shell script which script are you referring to, is that the one that I brought over in my fork, I don't immediately see any shell scripts under the spark-shell or the master directory, let me know if you are referring to the MahoutSparkILoop.scala class that you created/extended

        2) Are there a specific set of commands that would be good to exercize in distributed mode or is it the same commands for example that you outlined above but now running in distributed mode, also by distributed mode should we also be able to point to some remote spark/hadoop cluster node(s)

        Thanks for your help

        Show
        kanjilal Saikat Kanjilal added a comment - Dmitry, I tried to fork your repo but dont see the spark-shell showing up in my forked version of the repo for some reason, I also tried to clone from your repo and got a "repo not found" error. I'll go ahead and copy the spark-shell directory contents over for now, regardless had a few questions: 1) When you say mahout shell script which script are you referring to, is that the one that I brought over in my fork, I don't immediately see any shell scripts under the spark-shell or the master directory, let me know if you are referring to the MahoutSparkILoop.scala class that you created/extended 2) Are there a specific set of commands that would be good to exercize in distributed mode or is it the same commands for example that you outlined above but now running in distributed mode, also by distributed mode should we also be able to point to some remote spark/hadoop cluster node(s) Thanks for your help
        Hide
        dlyubimov Dmitriy Lyubimov added a comment -

        are you forking correct branch? You need to fork shell. Keep in mind that you do need to fork in github in order to issue a pull request (which is what i really would suggest you to do next in order to be able to put comment on the code).

        I assume everything will work exactly as it works today with spark interpreter (i.e. you supply master url to the shell invocation, optionally followed by the script file and execution options) – the shell will take it from there. you need to check out spark shell help to figure how it works.

        the master url of local mode is simply "local". Again see the spark manual, In the shell itself there's no distinction in command set regardless of the mode, everything is 100% identical.

        Show
        dlyubimov Dmitriy Lyubimov added a comment - are you forking correct branch? You need to fork shell. Keep in mind that you do need to fork in github in order to issue a pull request (which is what i really would suggest you to do next in order to be able to put comment on the code). I assume everything will work exactly as it works today with spark interpreter (i.e. you supply master url to the shell invocation, optionally followed by the script file and execution options) – the shell will take it from there. you need to check out spark shell help to figure how it works. the master url of local mode is simply "local". Again see the spark manual, In the shell itself there's no distinction in command set regardless of the mode, everything is 100% identical.
        Hide
        kanjilal Saikat Kanjilal added a comment -

        This is what I'm trying to fork:
        https://github.com/dlyubimov/mahout-commits/tree/shell

        I went to github and clicked on the fork button, when I did that here's my fork:

        https://github.com/skanjila/mahout-commits

        As you can see in my fork the spark-shell directory which I'm most interested in is missing.

        Thanks for the info on the commands, I'll begin this effort as soon as I bring the code over correctly

        Show
        kanjilal Saikat Kanjilal added a comment - This is what I'm trying to fork: https://github.com/dlyubimov/mahout-commits/tree/shell I went to github and clicked on the fork button, when I did that here's my fork: https://github.com/skanjila/mahout-commits As you can see in my fork the spark-shell directory which I'm most interested in is missing. Thanks for the info on the commands, I'll begin this effort as soon as I bring the code over correctly
        Hide
        dlyubimov Dmitriy Lyubimov added a comment - - edited

        yes. go to branch tab and select "shell" branch. it is there.

        Show
        dlyubimov Dmitriy Lyubimov added a comment - - edited yes. go to branch tab and select "shell" branch. it is there.
        Hide
        dlyubimov Dmitriy Lyubimov added a comment -

        First working patch. Also tracked by "shell" branch in my github/mahout-commits.

        seems to be working with single node cluster in STANDALONE mode. Also tested on-the-fly closures with `mapBlock()`.

        to start

        (1) compile Mahout
        (2) install and compile spark 0.9.1
        (3) make sure MAHOUT_HOME, SPARK_HOME pointing to mahout and spark 0.9.1 respectively

        Start Spark standalone cluster per instructions in Spark

        to start shell, use (for example)

         MASTER=spark://BigHP:7077 bin/mahout spark-shell
        

        Outstanding issues:

        (1) log level of course is not adjustable by Spark settings any more, since we use Mahout to start the jvm process. How do we set log4j.properties for Mahout??

        (2) After exiting from shell, the terminal driver on Linux ubuntu is screwed for some reason. (use "stty sane" to restore sanity to terminal control driver). Not sure why, this does not happen to either spark or scala shell.

        Show
        dlyubimov Dmitriy Lyubimov added a comment - First working patch. Also tracked by "shell" branch in my github/mahout-commits. seems to be working with single node cluster in STANDALONE mode. Also tested on-the-fly closures with `mapBlock()`. to start (1) compile Mahout (2) install and compile spark 0.9.1 (3) make sure MAHOUT_HOME, SPARK_HOME pointing to mahout and spark 0.9.1 respectively Start Spark standalone cluster per instructions in Spark to start shell, use (for example) MASTER=spark: //BigHP:7077 bin/mahout spark-shell Outstanding issues: (1) log level of course is not adjustable by Spark settings any more, since we use Mahout to start the jvm process. How do we set log4j.properties for Mahout?? (2) After exiting from shell, the terminal driver on Linux ubuntu is screwed for some reason. (use "stty sane" to restore sanity to terminal control driver). Not sure why, this does not happen to either spark or scala shell.
        Hide
        dlyubimov Dmitriy Lyubimov added a comment -

        Show
        dlyubimov Dmitriy Lyubimov added a comment -
        Hide
        kanjilal Saikat Kanjilal added a comment -

        Dmitry,
        I've been out of town so will work on continuing testing this when I come back next week, does the distributed mode testing still need to be done?

        Show
        kanjilal Saikat Kanjilal added a comment - Dmitry, I've been out of town so will work on continuing testing this when I come back next week, does the distributed mode testing still need to be done?
        Hide
        dlieu.7@gmail.com Dmitriy Lyubimov added a comment -

        it seems to be working in distributed mode, i don't see any further
        problems. There is a couple of cosmetic issues like i mentioned, but i am
        not yet sure how to fix them. I need input from somebody who knows how
        logger verbosity is managed in mahout, and i am not sure what it is
        specific to mahout startup might be screwing the terminal driver .

        Show
        dlieu.7@gmail.com Dmitriy Lyubimov added a comment - it seems to be working in distributed mode, i don't see any further problems. There is a couple of cosmetic issues like i mentioned, but i am not yet sure how to fix them. I need input from somebody who knows how logger verbosity is managed in mahout, and i am not sure what it is specific to mahout startup might be screwing the terminal driver .
        Hide
        dlyubimov Dmitriy Lyubimov added a comment -

        Ok i really want to commit it rather sooner than later because of the actively diverging trunk and because there are a few fixes (e.g. writeDRM must not require a view bound since implementation already carries it) so i 'd appreciate eyeballing the most recent patch.

        Show
        dlyubimov Dmitriy Lyubimov added a comment - Ok i really want to commit it rather sooner than later because of the actively diverging trunk and because there are a few fixes (e.g. writeDRM must not require a view bound since implementation already carries it) so i 'd appreciate eyeballing the most recent patch.
        Hide
        tdunning Ted Dunning added a comment -

        For new modules that really can't much break existing code, I think that
        committing often is a good thing. This is especially true since we are
        using SVN.

        Show
        tdunning Ted Dunning added a comment - For new modules that really can't much break existing code, I think that committing often is a good thing. This is especially true since we are using SVN.
        Hide
        dlyubimov Dmitriy Lyubimov added a comment -

        well the dangerous code is the mahout shell modifications.

        Since there's a lot of code there devoted (in fairly inconsistent way) figuring out Hadoop setup and its classpath, and spark commands now need to make sure that doesn't happen, while adding spark jars... it becomes somewhat kludgy and may break MR things. I so far don't see evidence of that happening but i am not verifying mahout MR commands all the way either.

        Show
        dlyubimov Dmitriy Lyubimov added a comment - well the dangerous code is the mahout shell modifications. Since there's a lot of code there devoted (in fairly inconsistent way) figuring out Hadoop setup and its classpath, and spark commands now need to make sure that doesn't happen, while adding spark jars... it becomes somewhat kludgy and may break MR things. I so far don't see evidence of that happening but i am not verifying mahout MR commands all the way either.
        Hide
        dlyubimov Dmitriy Lyubimov added a comment -

        also my questions are still relevant – in particular, how do we set up logging levels in Mahout?

        Show
        dlyubimov Dmitriy Lyubimov added a comment - also my questions are still relevant – in particular, how do we set up logging levels in Mahout?
        Hide
        andrew.musselman Andrew Musselman added a comment -

        I think we need to add a log4j.properties file so we can control logging properly, but that's another ticket I think.

        Show
        andrew.musselman Andrew Musselman added a comment - I think we need to add a log4j.properties file so we can control logging properly, but that's another ticket I think.
        Hide
        andrew.musselman Andrew Musselman added a comment -
        Show
        andrew.musselman Andrew Musselman added a comment - Filed https://issues.apache.org/jira/browse/MAHOUT-1522 for logging..
        Hide
        dlyubimov Dmitriy Lyubimov added a comment -

        Ok thanks for doing this. Thought there already was a way.

        Show
        dlyubimov Dmitriy Lyubimov added a comment - Ok thanks for doing this. Thought there already was a way.
        Hide
        tdunning Ted Dunning added a comment -

        Dmitriy Lyubimov

        Why do we even need to integrate with Mahout shell command?

        Show
        tdunning Ted Dunning added a comment - Dmitriy Lyubimov Why do we even need to integrate with Mahout shell command?
        Hide
        dlyubimov Dmitriy Lyubimov added a comment -

        i had a thought to make it another command but i guess i got too lazy, there's too much common in code figuring classpath etc. I guess i'd like to commit it as is, and if we don't like it, we'd tweak it later. I am not big on extensive shell scripting.

        We'll also need to tweak all that to include something like "config" folder with log4j.properties in it. So it probably makes sense to start with a common script there.

        Show
        dlyubimov Dmitriy Lyubimov added a comment - i had a thought to make it another command but i guess i got too lazy, there's too much common in code figuring classpath etc. I guess i'd like to commit it as is, and if we don't like it, we'd tweak it later. I am not big on extensive shell scripting. We'll also need to tweak all that to include something like "config" folder with log4j.properties in it. So it probably makes sense to start with a common script there.
        Hide
        dlyubimov Dmitriy Lyubimov added a comment -

        Ok, so two environment variables to pay attention to: MAHOUT_HOME and MASTER (Spark master).

        e.g. to run with "standalone" spark cluster:

        MAHOUT_HOME=~/tools/mahout MASTER='spark://master-host:7077' bin/mahout spark-shell
        
        Show
        dlyubimov Dmitriy Lyubimov added a comment - Ok, so two environment variables to pay attention to: MAHOUT_HOME and MASTER (Spark master). e.g. to run with "standalone" spark cluster: MAHOUT_HOME=~/tools/mahout MASTER='spark: //master-host:7077' bin/mahout spark-shell
        Hide
        dlyubimov Dmitriy Lyubimov added a comment -

        PPS. SPARK_HOME is also expected to point to Spark 0.9.1 setup in non-local mode. It will try to run spark-classpath.sh to figure Spark's binaries so make sure this shell works (it doesn't work if you haven't run full spark assembly build).

        Show
        dlyubimov Dmitriy Lyubimov added a comment - PPS. SPARK_HOME is also expected to point to Spark 0.9.1 setup in non-local mode. It will try to run spark-classpath.sh to figure Spark's binaries so make sure this shell works (it doesn't work if you haven't run full spark assembly build).
        Hide
        hudson Hudson added a comment -

        FAILURE: Integrated in Mahout-Quality #2589 (See https://builds.apache.org/job/Mahout-Quality/2589/)
        MAHOUT-1489 : initial Mahout spark shell commit

        Squashed commit of the following:

        commit 0124072b72fcdad9ccded43745c9b1d00e7ea089
        Merge: c1a2c8a c9164c1
        Author: Dmitriy Lyubimov <dlyubimov@apache.org>
        Date: Tue Apr 22 11:33:17 2014 -0700

        Merge branch 'trunk' into shell

        commit c1a2c8a414c015dcdce592b145498fc8b836addf
        Merge: a3491f5 a8df05b
        Author: Dmitriy Lyubimov <dlyubimov@apache.org>
        Date: Tue Apr 22 11:30:25 2014 -0700

        Merge branch 'trunk' into shell; misc fixes

        Conflicts:
        bin/mahout

        commit a3491f57e77b4b789051cf131bc7fdea73ad3e41
        Author: Dmitriy Lyubimov <dlyubimov@apache.org>
        Date: Mon Apr 21 17:39:06 2014 -0700

        -NonLocal

        commit bd0c83ebfa66e48f434f0ecc81bf81dd07d27f8c
        Merge: ad01add 78c45c4
        Author: Dmitriy Lyubimov <dlyubimov@apache.org>
        Date: Mon Apr 21 17:33:18 2014 -0700

        Merge commit '78c45c4c5d96f51e9' into shell

        commit ad01add55c1c5a212dabcd727e31dd36162c1fd0
        Author: Dmitriy Lyubimov <dlyubimov@apache.org>
        Date: Mon Apr 21 17:30:53 2014 -0700

        Fixing writeDRM problems

        commit f9f20e364e4462b6ba0693d11ee939ef5afd43a4
        Author: Dmitriy Lyubimov <dlyubimov@apache.org>
        Date: Mon Apr 21 14:58:28 2014 -0700

        writeDRM broken, some unknown closure attributes yank "this", not seeing where and which

        commit 7ca93f70ef7313cca251470c41032c08e9e612e2
        Author: Dmitriy Lyubimov <dlyubimov@apache.org>
        Date: Mon Apr 21 12:57:40 2014 -0700

        script errors

        commit ce5bdfaba160ecb6ee433b4682f62d9e6236e0b6
        Author: Dmitriy Lyubimov <dlyubimov@apache.org>
        Date: Thu Apr 17 13:51:35 2014 -0700

        drmFromHDFS() fixes – load and initiailize classtag evidence correctly.
        TODO: use class evidence from key for saveDRM.

        commit aeba609a957b9ef5aac7aa4eba01edb46004116f
        Author: Dmitriy Lyubimov <dlyubimov@apache.org>
        Date: Thu Apr 17 11:41:30 2014 -0700

        renaming script extensions

        commit 9b02ac2b6aa4ff046a898876f9a8fb23936cd78b
        Author: Dmitriy Lyubimov <dlyubimov@apache.org>
        Date: Thu Apr 17 11:17:04 2014 -0700

        build fix

        commit 2fbd835c4c4f8fac3ff32d06b9578d436f4bbf09
        Author: Dmitriy Lyubimov <dlyubimov@apache.org>
        Date: Wed Apr 16 18:04:47 2014 -0700

        WIP – unstable

        commit cc87347d393709ac0a6ab2adc66200caed07e911
        Author: Dmitriy Lyubimov <dlyubimov@apache.org>
        Date: Wed Apr 16 15:27:00 2014 -0700

        removing examples dependencies if running spark shell.

        commit 435992aa99090b9a75f3903ac5d10e43e6b49357
        Author: Dmitriy Lyubimov <dlyubimov@apache.org>
        Date: Wed Apr 16 14:10:22 2014 -0700

        WIP spark-shell, seems to be working

        commit 1f4fd51c2e5e18852d1b30d5a88897e14761d9e8
        Author: Dmitriy Lyubimov <dlyubimov@apache.org>
        Date: Wed Apr 16 12:55:17 2014 -0700

        WIP – implementing "mathout spark-shell"

        commit 5922a111405cc66c1d598b123c7d28f456229559
        Author: Dmitriy Lyubimov <dlyubimov@apache.org>
        Date: Thu Apr 10 17:53:23 2014 -0700

        First shell prototype works in local mode (dlyubimov: rev 1589246)

        • /mahout/trunk/bin/mahout
        • /mahout/trunk/math-scala/pom.xml
        • /mahout/trunk/pom.xml
        • /mahout/trunk/spark-shell
        • /mahout/trunk/spark-shell/pom.xml
        • /mahout/trunk/spark-shell/src
        • /mahout/trunk/spark-shell/src/main
        • /mahout/trunk/spark-shell/src/main/scala
        • /mahout/trunk/spark-shell/src/main/scala/org
        • /mahout/trunk/spark-shell/src/main/scala/org/apache
        • /mahout/trunk/spark-shell/src/main/scala/org/apache/mahout
        • /mahout/trunk/spark-shell/src/main/scala/org/apache/mahout/sparkbindings
        • /mahout/trunk/spark-shell/src/main/scala/org/apache/mahout/sparkbindings/shell
        • /mahout/trunk/spark-shell/src/main/scala/org/apache/mahout/sparkbindings/shell/MahoutSparkILoop.scala
        • /mahout/trunk/spark-shell/src/main/scala/org/apache/mahout/sparkbindings/shell/Main.scala
        • /mahout/trunk/spark-shell/src/test
        • /mahout/trunk/spark-shell/src/test/mahout
        • /mahout/trunk/spark-shell/src/test/mahout/simple.mscala
        • /mahout/trunk/spark/pom.xml
        • /mahout/trunk/spark/src/main/scala/org/apache/mahout/sparkbindings/drm/CheckpointedDrm.scala
        • /mahout/trunk/spark/src/main/scala/org/apache/mahout/sparkbindings/drm/CheckpointedDrmBase.scala
        • /mahout/trunk/spark/src/main/scala/org/apache/mahout/sparkbindings/drm/package.scala
        • /mahout/trunk/spark/src/main/scala/org/apache/mahout/sparkbindings/package.scala
        • /mahout/trunk/spark/src/test/scala/org/apache/mahout/sparkbindings/test/MahoutLocalContext.scala
        Show
        hudson Hudson added a comment - FAILURE: Integrated in Mahout-Quality #2589 (See https://builds.apache.org/job/Mahout-Quality/2589/ ) MAHOUT-1489 : initial Mahout spark shell commit Squashed commit of the following: commit 0124072b72fcdad9ccded43745c9b1d00e7ea089 Merge: c1a2c8a c9164c1 Author: Dmitriy Lyubimov <dlyubimov@apache.org> Date: Tue Apr 22 11:33:17 2014 -0700 Merge branch 'trunk' into shell commit c1a2c8a414c015dcdce592b145498fc8b836addf Merge: a3491f5 a8df05b Author: Dmitriy Lyubimov <dlyubimov@apache.org> Date: Tue Apr 22 11:30:25 2014 -0700 Merge branch 'trunk' into shell; misc fixes Conflicts: bin/mahout commit a3491f57e77b4b789051cf131bc7fdea73ad3e41 Author: Dmitriy Lyubimov <dlyubimov@apache.org> Date: Mon Apr 21 17:39:06 2014 -0700 -NonLocal commit bd0c83ebfa66e48f434f0ecc81bf81dd07d27f8c Merge: ad01add 78c45c4 Author: Dmitriy Lyubimov <dlyubimov@apache.org> Date: Mon Apr 21 17:33:18 2014 -0700 Merge commit '78c45c4c5d96f51e9' into shell commit ad01add55c1c5a212dabcd727e31dd36162c1fd0 Author: Dmitriy Lyubimov <dlyubimov@apache.org> Date: Mon Apr 21 17:30:53 2014 -0700 Fixing writeDRM problems commit f9f20e364e4462b6ba0693d11ee939ef5afd43a4 Author: Dmitriy Lyubimov <dlyubimov@apache.org> Date: Mon Apr 21 14:58:28 2014 -0700 writeDRM broken, some unknown closure attributes yank "this", not seeing where and which commit 7ca93f70ef7313cca251470c41032c08e9e612e2 Author: Dmitriy Lyubimov <dlyubimov@apache.org> Date: Mon Apr 21 12:57:40 2014 -0700 script errors commit ce5bdfaba160ecb6ee433b4682f62d9e6236e0b6 Author: Dmitriy Lyubimov <dlyubimov@apache.org> Date: Thu Apr 17 13:51:35 2014 -0700 drmFromHDFS() fixes – load and initiailize classtag evidence correctly. TODO: use class evidence from key for saveDRM. commit aeba609a957b9ef5aac7aa4eba01edb46004116f Author: Dmitriy Lyubimov <dlyubimov@apache.org> Date: Thu Apr 17 11:41:30 2014 -0700 renaming script extensions commit 9b02ac2b6aa4ff046a898876f9a8fb23936cd78b Author: Dmitriy Lyubimov <dlyubimov@apache.org> Date: Thu Apr 17 11:17:04 2014 -0700 build fix commit 2fbd835c4c4f8fac3ff32d06b9578d436f4bbf09 Author: Dmitriy Lyubimov <dlyubimov@apache.org> Date: Wed Apr 16 18:04:47 2014 -0700 WIP – unstable commit cc87347d393709ac0a6ab2adc66200caed07e911 Author: Dmitriy Lyubimov <dlyubimov@apache.org> Date: Wed Apr 16 15:27:00 2014 -0700 removing examples dependencies if running spark shell. commit 435992aa99090b9a75f3903ac5d10e43e6b49357 Author: Dmitriy Lyubimov <dlyubimov@apache.org> Date: Wed Apr 16 14:10:22 2014 -0700 WIP spark-shell, seems to be working commit 1f4fd51c2e5e18852d1b30d5a88897e14761d9e8 Author: Dmitriy Lyubimov <dlyubimov@apache.org> Date: Wed Apr 16 12:55:17 2014 -0700 WIP – implementing "mathout spark-shell" commit 5922a111405cc66c1d598b123c7d28f456229559 Author: Dmitriy Lyubimov <dlyubimov@apache.org> Date: Thu Apr 10 17:53:23 2014 -0700 First shell prototype works in local mode (dlyubimov: rev 1589246) /mahout/trunk/bin/mahout /mahout/trunk/math-scala/pom.xml /mahout/trunk/pom.xml /mahout/trunk/spark-shell /mahout/trunk/spark-shell/pom.xml /mahout/trunk/spark-shell/src /mahout/trunk/spark-shell/src/main /mahout/trunk/spark-shell/src/main/scala /mahout/trunk/spark-shell/src/main/scala/org /mahout/trunk/spark-shell/src/main/scala/org/apache /mahout/trunk/spark-shell/src/main/scala/org/apache/mahout /mahout/trunk/spark-shell/src/main/scala/org/apache/mahout/sparkbindings /mahout/trunk/spark-shell/src/main/scala/org/apache/mahout/sparkbindings/shell /mahout/trunk/spark-shell/src/main/scala/org/apache/mahout/sparkbindings/shell/MahoutSparkILoop.scala /mahout/trunk/spark-shell/src/main/scala/org/apache/mahout/sparkbindings/shell/Main.scala /mahout/trunk/spark-shell/src/test /mahout/trunk/spark-shell/src/test/mahout /mahout/trunk/spark-shell/src/test/mahout/simple.mscala /mahout/trunk/spark/pom.xml /mahout/trunk/spark/src/main/scala/org/apache/mahout/sparkbindings/drm/CheckpointedDrm.scala /mahout/trunk/spark/src/main/scala/org/apache/mahout/sparkbindings/drm/CheckpointedDrmBase.scala /mahout/trunk/spark/src/main/scala/org/apache/mahout/sparkbindings/drm/package.scala /mahout/trunk/spark/src/main/scala/org/apache/mahout/sparkbindings/package.scala /mahout/trunk/spark/src/test/scala/org/apache/mahout/sparkbindings/test/MahoutLocalContext.scala
        Hide
        ssc Sebastian Schelter added a comment -

        What's the status here?

        Show
        ssc Sebastian Schelter added a comment - What's the status here?
        Hide
        dlyubimov Dmitriy Lyubimov added a comment -

        status is patch available (and actually committed).

        One thing that would make this better is introducing explicit log level management in Mahout, but this has been filed as another issue.

        A minor thing is as i mentioned, i can figure out why terminal driver is screwed after graceful shell exit. So i think we can close this issue and re-file this as a bug if so desired, along with other things as we find them.

        Show
        dlyubimov Dmitriy Lyubimov added a comment - status is patch available (and actually committed). One thing that would make this better is introducing explicit log level management in Mahout, but this has been filed as another issue. A minor thing is as i mentioned, i can figure out why terminal driver is screwed after graceful shell exit. So i think we can close this issue and re-file this as a bug if so desired, along with other things as we find them.
        Hide
        ssc Sebastian Schelter added a comment -

        I'm having some trouble getting this work. Do I have to configure spark in a specific way or is it enough to build it and run ./sbin/start-master.sh ?

        Show
        ssc Sebastian Schelter added a comment - I'm having some trouble getting this work. Do I have to configure spark in a specific way or is it enough to build it and run ./sbin/start-master.sh ?
        Hide
        dlyubimov Dmitriy Lyubimov added a comment -

        let me recap what i said before. No, you don't have to configure spark in any special way.

        (1) install spark 0.9.1, make sure assembly is built, (sb/sbt assembly in SPARK_HOME) set up SPARK_HOME, make sure $SPARK_HOME/bin/spark-classpath.sh (or whatever this script is) produces no errors
        (2) compile mahout, set up MAHOUT_HOME.
        (3) try with local mode

          MASTER="local" bin/mahout spark-shell
        

        that should be enough. LMK what troubles are happening after that.

        Show
        dlyubimov Dmitriy Lyubimov added a comment - let me recap what i said before. No, you don't have to configure spark in any special way. (1) install spark 0.9.1 , make sure assembly is built, ( sb/sbt assembly in SPARK_HOME ) set up SPARK_HOME, make sure $SPARK_HOME/bin/spark-classpath.sh (or whatever this script is) produces no errors (2) compile mahout, set up MAHOUT_HOME. (3) try with local mode MASTER= "local" bin/mahout spark-shell that should be enough. LMK what troubles are happening after that.
        Hide
        dlyubimov Dmitriy Lyubimov added a comment -

        PS my environment is ubuntu LTS 12. YMMV on mac as i haven't tested it there.

        Show
        dlyubimov Dmitriy Lyubimov added a comment - PS my environment is ubuntu LTS 12. YMMV on mac as i haven't tested it there.
        Hide
        ssc Sebastian Schelter added a comment -

        I'm also running Ubuntu 12 LTS. I'm getting a NoClassDefFoundError:

        java.lang.NoClassDefFoundError: org/apache/mahout/common/IOUtils
        	at org.apache.mahout.sparkbindings.package$.mahoutSparkContext(package.scala:131)
        	at org.apache.mahout.sparkbindings.shell.MahoutSparkILoop.createSparkContext(MahoutSparkILoop.scala:44)
        	at $iwC$$iwC.<init>(<console>:8)
        	at $iwC.<init>(<console>:14)
        	at <init>(<console>:16)
        	at .<init>(<console>:20)
        	at .<clinit>(<console>)
        	at .<init>(<console>:7)
        	at .<clinit>(<console>)
        	at $print(<console>)
        	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
        	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        	at java.lang.reflect.Method.invoke(Method.java:606)
        	at org.apache.spark.repl.SparkIMain$ReadEvalPrint.call(SparkIMain.scala:772)
        	at org.apache.spark.repl.SparkIMain$Request.loadAndRun(SparkIMain.scala:1040)
        	at org.apache.spark.repl.SparkIMain.loadAndRunReq$1(SparkIMain.scala:609)
        	at org.apache.spark.repl.SparkIMain.interpret(SparkIMain.scala:640)
        	at org.apache.spark.repl.SparkIMain.interpret(SparkIMain.scala:604)
        	at org.apache.spark.repl.SparkILoop.reallyInterpret$1(SparkILoop.scala:793)
        	at org.apache.spark.repl.SparkILoop.interpretStartingWith(SparkILoop.scala:838)
        	at org.apache.spark.repl.SparkILoop.command(SparkILoop.scala:750)
        	at org.apache.spark.repl.SparkILoopInit$$anonfun$initializeSpark$1.apply(SparkILoopInit.scala:119)
        	at org.apache.spark.repl.SparkILoopInit$$anonfun$initializeSpark$1.apply(SparkILoopInit.scala:118)
        	at org.apache.spark.repl.SparkIMain.beQuietDuring(SparkIMain.scala:258)
        	at org.apache.spark.repl.SparkILoopInit$class.initializeSpark(SparkILoopInit.scala:118)
        	at org.apache.spark.repl.SparkILoop.initializeSpark(SparkILoop.scala:53)
        	at org.apache.spark.repl.SparkILoop$$anonfun$process$1$$anonfun$apply$mcZ$sp$5.apply$mcV$sp(SparkILoop.scala:908)
        	at org.apache.spark.repl.SparkILoopInit$class.runThunks(SparkILoopInit.scala:140)
        	at org.apache.spark.repl.SparkILoop.runThunks(SparkILoop.scala:53)
        	at org.apache.spark.repl.SparkILoopInit$class.postInitialization(SparkILoopInit.scala:102)
        	at org.apache.mahout.sparkbindings.shell.MahoutSparkILoop.postInitialization(MahoutSparkILoop.scala:20)
        	at org.apache.spark.repl.SparkILoop$$anonfun$process$1.apply$mcZ$sp(SparkILoop.scala:925)
        	at org.apache.spark.repl.SparkILoop$$anonfun$process$1.apply(SparkILoop.scala:881)
        	at org.apache.spark.repl.SparkILoop$$anonfun$process$1.apply(SparkILoop.scala:881)
        	at scala.tools.nsc.util.ScalaClassLoader$.savingContextLoader(ScalaClassLoader.scala:135)
        	at org.apache.spark.repl.SparkILoop.process(SparkILoop.scala:881)
        	at org.apache.spark.repl.SparkILoop.process(SparkILoop.scala:973)
        	at org.apache.mahout.sparkbindings.shell.Main$.main(Main.scala:14)
        	at org.apache.mahout.sparkbindings.shell.Main.main(Main.scala)
        Caused by: java.lang.ClassNotFoundException: org.apache.mahout.common.IOUtils
        	at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
        	at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
        	at java.security.AccessController.doPrivileged(Native Method)
        	at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
        	at java.lang.ClassLoader.loadClass(ClassLoader.java:425)
        	at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308)
        	at java.lang.ClassLoader.loadClass(ClassLoader.java:358)
        	... 40 more
        
        Show
        ssc Sebastian Schelter added a comment - I'm also running Ubuntu 12 LTS. I'm getting a NoClassDefFoundError: java.lang.NoClassDefFoundError: org/apache/mahout/common/IOUtils at org.apache.mahout.sparkbindings. package $.mahoutSparkContext( package .scala:131) at org.apache.mahout.sparkbindings.shell.MahoutSparkILoop.createSparkContext(MahoutSparkILoop.scala:44) at $iwC$$iwC.<init>(<console>:8) at $iwC.<init>(<console>:14) at <init>(<console>:16) at .<init>(<console>:20) at .<clinit>(<console>) at .<init>(<console>:7) at .<clinit>(<console>) at $print(<console>) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.spark.repl.SparkIMain$ReadEvalPrint.call(SparkIMain.scala:772) at org.apache.spark.repl.SparkIMain$Request.loadAndRun(SparkIMain.scala:1040) at org.apache.spark.repl.SparkIMain.loadAndRunReq$1(SparkIMain.scala:609) at org.apache.spark.repl.SparkIMain.interpret(SparkIMain.scala:640) at org.apache.spark.repl.SparkIMain.interpret(SparkIMain.scala:604) at org.apache.spark.repl.SparkILoop.reallyInterpret$1(SparkILoop.scala:793) at org.apache.spark.repl.SparkILoop.interpretStartingWith(SparkILoop.scala:838) at org.apache.spark.repl.SparkILoop.command(SparkILoop.scala:750) at org.apache.spark.repl.SparkILoopInit$$anonfun$initializeSpark$1.apply(SparkILoopInit.scala:119) at org.apache.spark.repl.SparkILoopInit$$anonfun$initializeSpark$1.apply(SparkILoopInit.scala:118) at org.apache.spark.repl.SparkIMain.beQuietDuring(SparkIMain.scala:258) at org.apache.spark.repl.SparkILoopInit$class.initializeSpark(SparkILoopInit.scala:118) at org.apache.spark.repl.SparkILoop.initializeSpark(SparkILoop.scala:53) at org.apache.spark.repl.SparkILoop$$anonfun$process$1$$anonfun$apply$mcZ$sp$5.apply$mcV$sp(SparkILoop.scala:908) at org.apache.spark.repl.SparkILoopInit$class.runThunks(SparkILoopInit.scala:140) at org.apache.spark.repl.SparkILoop.runThunks(SparkILoop.scala:53) at org.apache.spark.repl.SparkILoopInit$class.postInitialization(SparkILoopInit.scala:102) at org.apache.mahout.sparkbindings.shell.MahoutSparkILoop.postInitialization(MahoutSparkILoop.scala:20) at org.apache.spark.repl.SparkILoop$$anonfun$process$1.apply$mcZ$sp(SparkILoop.scala:925) at org.apache.spark.repl.SparkILoop$$anonfun$process$1.apply(SparkILoop.scala:881) at org.apache.spark.repl.SparkILoop$$anonfun$process$1.apply(SparkILoop.scala:881) at scala.tools.nsc.util.ScalaClassLoader$.savingContextLoader(ScalaClassLoader.scala:135) at org.apache.spark.repl.SparkILoop.process(SparkILoop.scala:881) at org.apache.spark.repl.SparkILoop.process(SparkILoop.scala:973) at org.apache.mahout.sparkbindings.shell.Main$.main(Main.scala:14) at org.apache.mahout.sparkbindings.shell.Main.main(Main.scala) Caused by: java.lang.ClassNotFoundException: org.apache.mahout.common.IOUtils at java.net.URLClassLoader$1.run(URLClassLoader.java:366) at java.net.URLClassLoader$1.run(URLClassLoader.java:355) at java.security.AccessController.doPrivileged(Native Method) at java.net.URLClassLoader.findClass(URLClassLoader.java:354) at java.lang. ClassLoader .loadClass( ClassLoader .java:425) at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308) at java.lang. ClassLoader .loadClass( ClassLoader .java:358) ... 40 more
        Hide
        dlyubimov Dmitriy Lyubimov added a comment -

        can you please run "bin/mahout -spark classpath" here? thanks.

        my output is

        bin/mahout -spark classpath | sed "s/:/\n/g"
        MAHOUT_LOCAL is not set; adding HADOOP_CONF_DIR to classpath.
        Running on hadoop, using /home/dmitriy/tools/hadoop/bin/hadoop and HADOOP_CONF_DIR=/home/dmitriy/tools/hadoop/etc/hadoop
        
        /home/dmitriy/projects/asf/mahout-commits/src/conf
        /home/dmitriy/tools/hadoop/etc/hadoop
        /home/dmitriy/tools/java/lib/tools.jar
        /home/dmitriy/projects/asf/mahout-commits/mahout-*.jar
        /home/dmitriy/projects/asf/mahout-commits/math-scala/target/mahout-math-scala-1.0-SNAPSHOT.jar
        /home/dmitriy/projects/asf/mahout-commits/math-scala/target/mahout-math-scala-1.0-SNAPSHOT-sources.jar
        /home/dmitriy/projects/asf/mahout-commits/math-scala/target/mahout-math-scala-1.0-SNAPSHOT-tests.jar
        /home/dmitriy/projects/asf/mahout-commits/core/target/mahout-core-1.0-SNAPSHOT.jar
        /home/dmitriy/projects/asf/mahout-commits/core/target/mahout-core-1.0-SNAPSHOT-job.jar
        /home/dmitriy/projects/asf/mahout-commits/core/target/mahout-core-1.0-SNAPSHOT-sources.jar
        /home/dmitriy/projects/asf/mahout-commits/core/target/mahout-core-1.0-SNAPSHOT-tests.jar
        /home/dmitriy/projects/asf/mahout-commits/spark/target/mahout-spark-1.0-SNAPSHOT.jar
        /home/dmitriy/projects/asf/mahout-commits/spark/target/mahout-spark-1.0-SNAPSHOT-sources.jar
        /home/dmitriy/projects/asf/mahout-commits/spark/target/mahout-spark-1.0-SNAPSHOT-tests.jar
        /home/dmitriy/projects/asf/mahout-commits/spark-shell/target/mahout-spark-shell-1.0-SNAPSHOT.jar
        /home/dmitriy/projects/asf/mahout-commits/spark-shell/target/mahout-spark-shell-1.0-SNAPSHOT-sources.jar
        /home/dmitriy/projects/asf/mahout-commits/spark-shell/target/mahout-spark-shell-1.0-SNAPSHOT-tests.jar
        
        /home/dmitriy/tools/spark/conf
        /home/dmitriy/tools/spark/assembly/target/scala-2.10/spark-assembly-0.9.1-hadoop2.0.0-cdh4.3.0.jar
        /home/dmitriy/tools/hadoop/etc/hadoop
        /home/dmitriy/tools/hadoop/etc/hadoop
        /home/dmitriy/projects/asf/mahout-commits/lib/*.jar
        
        Show
        dlyubimov Dmitriy Lyubimov added a comment - can you please run "bin/mahout -spark classpath" here? thanks. my output is bin/mahout -spark classpath | sed "s/:/\n/g" MAHOUT_LOCAL is not set; adding HADOOP_CONF_DIR to classpath. Running on hadoop, using /home/dmitriy/tools/hadoop/bin/hadoop and HADOOP_CONF_DIR=/home/dmitriy/tools/hadoop/etc/hadoop /home/dmitriy/projects/asf/mahout-commits/src/conf /home/dmitriy/tools/hadoop/etc/hadoop /home/dmitriy/tools/java/lib/tools.jar /home/dmitriy/projects/asf/mahout-commits/mahout-*.jar /home/dmitriy/projects/asf/mahout-commits/math-scala/target/mahout-math-scala-1.0-SNAPSHOT.jar /home/dmitriy/projects/asf/mahout-commits/math-scala/target/mahout-math-scala-1.0-SNAPSHOT-sources.jar /home/dmitriy/projects/asf/mahout-commits/math-scala/target/mahout-math-scala-1.0-SNAPSHOT-tests.jar /home/dmitriy/projects/asf/mahout-commits/core/target/mahout-core-1.0-SNAPSHOT.jar /home/dmitriy/projects/asf/mahout-commits/core/target/mahout-core-1.0-SNAPSHOT-job.jar /home/dmitriy/projects/asf/mahout-commits/core/target/mahout-core-1.0-SNAPSHOT-sources.jar /home/dmitriy/projects/asf/mahout-commits/core/target/mahout-core-1.0-SNAPSHOT-tests.jar /home/dmitriy/projects/asf/mahout-commits/spark/target/mahout-spark-1.0-SNAPSHOT.jar /home/dmitriy/projects/asf/mahout-commits/spark/target/mahout-spark-1.0-SNAPSHOT-sources.jar /home/dmitriy/projects/asf/mahout-commits/spark/target/mahout-spark-1.0-SNAPSHOT-tests.jar /home/dmitriy/projects/asf/mahout-commits/spark-shell/target/mahout-spark-shell-1.0-SNAPSHOT.jar /home/dmitriy/projects/asf/mahout-commits/spark-shell/target/mahout-spark-shell-1.0-SNAPSHOT-sources.jar /home/dmitriy/projects/asf/mahout-commits/spark-shell/target/mahout-spark-shell-1.0-SNAPSHOT-tests.jar /home/dmitriy/tools/spark/conf /home/dmitriy/tools/spark/assembly/target/scala-2.10/spark-assembly-0.9.1-hadoop2.0.0-cdh4.3.0.jar /home/dmitriy/tools/hadoop/etc/hadoop /home/dmitriy/tools/hadoop/etc/hadoop /home/dmitriy/projects/asf/mahout-commits/lib/*.jar
        Hide
        ssc Sebastian Schelter added a comment -

        core is mrlegacy now, thats causing the the error.

        Show
        ssc Sebastian Schelter added a comment - core is mrlegacy now, thats causing the the error.
        Hide
        ssc Sebastian Schelter added a comment -

        I suggest to change the prompt from "scala>" to "mahout>", like the idea?

        Show
        ssc Sebastian Schelter added a comment - I suggest to change the prompt from "scala>" to "mahout>", like the idea?
        Hide
        andrew.musselman Andrew Musselman added a comment -

        +1 to the mahout> prompt

        Show
        andrew.musselman Andrew Musselman added a comment - +1 to the mahout> prompt
        Hide
        dlyubimov Dmitriy Lyubimov added a comment -

        sure.

        On Mon, Apr 28, 2014 at 2:37 PM, Sebastian Schelter (JIRA)

        Show
        dlyubimov Dmitriy Lyubimov added a comment - sure. On Mon, Apr 28, 2014 at 2:37 PM, Sebastian Schelter (JIRA)
        Hide
        hudson Hudson added a comment -

        FAILURE: Integrated in Mahout-Quality #2601 (See https://builds.apache.org/job/Mahout-Quality/2601/)
        MAHOUT-1489 Interactive Scala & Spark Bindings Shell & Script processor (ssc: rev 1590807)

        • /mahout/trunk/bin/mahout
        Show
        hudson Hudson added a comment - FAILURE: Integrated in Mahout-Quality #2601 (See https://builds.apache.org/job/Mahout-Quality/2601/ ) MAHOUT-1489 Interactive Scala & Spark Bindings Shell & Script processor (ssc: rev 1590807) /mahout/trunk/bin/mahout

          People

          • Assignee:
            dlyubimov Dmitriy Lyubimov
            Reporter:
            kanjilal Saikat Kanjilal
          • Votes:
            0 Vote for this issue
            Watchers:
            7 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development