Uploaded image for project: 'TinkerPop'
  1. TinkerPop
  2. TINKERPOP-1023

Add a spark variable in SparkGremlinPlugin like we do hdfs for HadoopGremlinPlugin

    Details

    • Type: Improvement
    • Status: Closed
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: 3.1.0-incubating
    • Fix Version/s: 3.1.1-incubating
    • Component/s: hadoop
    • Labels:
      None

      Description

      It would be good if from the Gremlin Console we could do things like this:

      gremlin> spark.getRDDs()
      gremlin> spark.removeRDD("graphRDD")
      gremlin> spark.getMaster()
      gremlin> spark.isPersisted()
      

      With the ability to have persisted context's, its confusing as to what is persisted and what is not. With a spark like we have with hdfs it will make it more clear.

        Activity

        Hide
        githubbot ASF GitHub Bot added a comment -

        Github user asfgit closed the pull request at:

        https://github.com/apache/incubator-tinkerpop/pull/173

        Show
        githubbot ASF GitHub Bot added a comment - Github user asfgit closed the pull request at: https://github.com/apache/incubator-tinkerpop/pull/173
        Hide
        githubbot ASF GitHub Bot added a comment -

        Github user spmallette commented on the pull request:

        https://github.com/apache/incubator-tinkerpop/pull/173#issuecomment-163193920

        Builds and tests nicely. Did some simple manual tests with `spark` object - worked.

        Just a reminder that upgrade docs are lagging a bit behind all the spark work that's been done.

        VOTE: +1

        Show
        githubbot ASF GitHub Bot added a comment - Github user spmallette commented on the pull request: https://github.com/apache/incubator-tinkerpop/pull/173#issuecomment-163193920 Builds and tests nicely. Did some simple manual tests with `spark` object - worked. Just a reminder that upgrade docs are lagging a bit behind all the spark work that's been done. VOTE: +1
        Hide
        githubbot ASF GitHub Bot added a comment -

        Github user dkuppitz commented on the pull request:

        https://github.com/apache/incubator-tinkerpop/pull/173#issuecomment-163064768

        Yes, that was the issue.

        *Update*:

        • `mvn clean install`: worked
        • manual tests using the new `spark` object: worked

        VOTE: +1

        Show
        githubbot ASF GitHub Bot added a comment - Github user dkuppitz commented on the pull request: https://github.com/apache/incubator-tinkerpop/pull/173#issuecomment-163064768 Yes, that was the issue. * Update *: `mvn clean install`: worked manual tests using the new `spark` object: worked VOTE: +1
        Hide
        githubbot ASF GitHub Bot added a comment -

        Github user okram commented on the pull request:

        https://github.com/apache/incubator-tinkerpop/pull/173#issuecomment-163061226

        @dkuppitz – did you clear your grapes?

        Show
        githubbot ASF GitHub Bot added a comment - Github user okram commented on the pull request: https://github.com/apache/incubator-tinkerpop/pull/173#issuecomment-163061226 @dkuppitz – did you clear your grapes?
        Hide
        githubbot ASF GitHub Bot added a comment -

        Github user dkuppitz commented on the pull request:

        https://github.com/apache/incubator-tinkerpop/pull/173#issuecomment-163058755

        • `mvn clean install`: worked
        • manual tests using the new `spark` object: failed

        ```
        daniel@cube /projects/apache/test/incubator-tinkerpop/gremlin-console/target/apache-gremlin-console-3.1.1-SNAPSHOT-standalone (TINKERPOP-1023) $ HADOOP_GREMLIN_LIBS=`pwd`/ext/hadoop-gremlin/lib:`pwd`/ext/spark-gremlin/lib bin/gremlin.sh

        \,,,/
        (o o)
        ----oOOo(3)oOOo----
        plugin activated: tinkerpop.server
        plugin activated: tinkerpop.utilities
        WARN org.apache.hadoop.util.NativeCodeLoader - Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
        INFO org.apache.tinkerpop.gremlin.hadoop.structure.HadoopGraph - HADOOP_GREMLIN_LIBS is set to: /projects/apache/test/incubator-tinkerpop/gremlin-console/target/apache-gremlin-console-3.1.1-SNAPSHOT-standalone/ext/hadoop-gremlin/lib:/projects/apache/test/incubator-tinkerpop/gremlin-console/target/apache-gremlin-console-3.1.1-SNAPSHOT-standalone/ext/spark-gremlin/lib
        plugin activated: tinkerpop.hadoop
        plugin activated: tinkerpop.spark
        plugin activated: tinkerpop.tinkergraph
        gremlin> spark
        No such property: spark for class: groovysh_evaluate
        Display stack trace? [yN] N
        gremlin> spark.create("local[4]")
        No such property: spark for class: groovysh_evaluate
        Display stack trace? [yN] N
        gremlin>
        ```

        What am I missing here?

        Show
        githubbot ASF GitHub Bot added a comment - Github user dkuppitz commented on the pull request: https://github.com/apache/incubator-tinkerpop/pull/173#issuecomment-163058755 `mvn clean install`: worked manual tests using the new `spark` object: failed ``` daniel@cube /projects/apache/test/incubator-tinkerpop/gremlin-console/target/apache-gremlin-console-3.1.1-SNAPSHOT-standalone ( TINKERPOP-1023 ) $ HADOOP_GREMLIN_LIBS=`pwd`/ext/hadoop-gremlin/lib:`pwd`/ext/spark-gremlin/lib bin/gremlin.sh \,,,/ (o o) ---- oOOo (3) oOOo ---- plugin activated: tinkerpop.server plugin activated: tinkerpop.utilities WARN org.apache.hadoop.util.NativeCodeLoader - Unable to load native-hadoop library for your platform... using builtin-java classes where applicable INFO org.apache.tinkerpop.gremlin.hadoop.structure.HadoopGraph - HADOOP_GREMLIN_LIBS is set to: /projects/apache/test/incubator-tinkerpop/gremlin-console/target/apache-gremlin-console-3.1.1-SNAPSHOT-standalone/ext/hadoop-gremlin/lib:/projects/apache/test/incubator-tinkerpop/gremlin-console/target/apache-gremlin-console-3.1.1-SNAPSHOT-standalone/ext/spark-gremlin/lib plugin activated: tinkerpop.hadoop plugin activated: tinkerpop.spark plugin activated: tinkerpop.tinkergraph gremlin> spark No such property: spark for class: groovysh_evaluate Display stack trace? [yN] N gremlin> spark.create("local [4] ") No such property: spark for class: groovysh_evaluate Display stack trace? [yN] N gremlin> ``` What am I missing here?
        Hide
        githubbot ASF GitHub Bot added a comment -

        Github user okram commented on the pull request:

        https://github.com/apache/incubator-tinkerpop/pull/173#issuecomment-162967063

        Note that I also tested this with Spark Server and it works great. This is a really really cool thing.

        Show
        githubbot ASF GitHub Bot added a comment - Github user okram commented on the pull request: https://github.com/apache/incubator-tinkerpop/pull/173#issuecomment-162967063 Note that I also tested this with Spark Server and it works great. This is a really really cool thing.
        Hide
        githubbot ASF GitHub Bot added a comment -

        GitHub user okram opened a pull request:

        https://github.com/apache/incubator-tinkerpop/pull/173

        TINKERPOP-1023: Add a spark variable in SparkGremlinPlugin like we do hdfs for HadoopGremlinPlugin

        https://issues.apache.org/jira/browse/TINKERPOP-1023

        Like `hdfs` there is now `spark` which allows the user to manage their persisted contexts. In essence, the Spark Server looks like a file system with (named) RDDs accessible. For instance, you can `spark.ls()`, `spark.rm()`, `spark.describe()`. I added a `SparkGremlinPluginTest` which ensures that all the proper imports/etc. work in the Console. I also added the information the reference docs. I published the reference docs so people can see it in action:

        http://tinkerpop.apache.org/docs/3.1.1-SNAPSHOT/reference/#sparkgraphcomputer (scroll down to "Using A Persisted Context" section)

        VOTE +1. (`mvn clean install` and Spark integration tests)

        You can merge this pull request into a Git repository by running:

        $ git pull https://github.com/apache/incubator-tinkerpop TINKERPOP-1023

        Alternatively you can review and apply these changes as the patch at:

        https://github.com/apache/incubator-tinkerpop/pull/173.patch

        To close this pull request, make a commit to your master/trunk branch
        with (at least) the following in the commit message:

        This closes #173


        commit 9d6467c8f48cd34e83270d3f3eabbcce9ce74f05
        Author: Marko A. Rodriguez <okrammarko@gmail.com>
        Date: 2015-12-08T15:00:47Z

        fist push on a Spark object for managing persisted RDDs. Not finished yet.

        commit 4d1d8c90cead1aac4ca61b4510ea533c39b1ad7a
        Author: Marko A. Rodriguez <okrammarko@gmail.com>
        Date: 2015-12-08T15:29:26Z

        Merge branch 'master' into TINKERPOP-1023

        commit f8fabe20108f4cec8f4c50c7f7bf6523c112acac
        Author: Marko A. Rodriguez <okrammarko@gmail.com>
        Date: 2015-12-08T16:28:19Z

        added Spark persited RDD utility that can be spark.ls(), spark.head(), spark.rm(), spark.describe(), etc. in the Console. Really cool. Added a SparkGremlinPluginTest that verifies everything works as expected. Updated docs explaining the new tool.

        commit debea174494d9c22fbcfca1cd505328b9a998e08
        Author: Marko A. Rodriguez <okrammarko@gmail.com>
        Date: 2015-12-08T17:12:01Z

        added spark RDD utility to docs.

        commit 9be5a6d35e023e921017ecb44b80d767095f916a
        Author: Marko A. Rodriguez <okrammarko@gmail.com>
        Date: 2015-12-08T17:16:51Z

        minor section rename.

        commit 97828f10550dff00fdb4474d2b36bff30472182c
        Author: Marko A. Rodriguez <okrammarko@gmail.com>
        Date: 2015-12-08T17:24:39Z

        removed debugging work in SparkTest.


        Show
        githubbot ASF GitHub Bot added a comment - GitHub user okram opened a pull request: https://github.com/apache/incubator-tinkerpop/pull/173 TINKERPOP-1023 : Add a spark variable in SparkGremlinPlugin like we do hdfs for HadoopGremlinPlugin https://issues.apache.org/jira/browse/TINKERPOP-1023 Like `hdfs` there is now `spark` which allows the user to manage their persisted contexts. In essence, the Spark Server looks like a file system with (named) RDDs accessible. For instance, you can `spark.ls()`, `spark.rm()`, `spark.describe()`. I added a `SparkGremlinPluginTest` which ensures that all the proper imports/etc. work in the Console. I also added the information the reference docs. I published the reference docs so people can see it in action: http://tinkerpop.apache.org/docs/3.1.1-SNAPSHOT/reference/#sparkgraphcomputer (scroll down to "Using A Persisted Context" section) VOTE +1. (`mvn clean install` and Spark integration tests) You can merge this pull request into a Git repository by running: $ git pull https://github.com/apache/incubator-tinkerpop TINKERPOP-1023 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/incubator-tinkerpop/pull/173.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #173 commit 9d6467c8f48cd34e83270d3f3eabbcce9ce74f05 Author: Marko A. Rodriguez <okrammarko@gmail.com> Date: 2015-12-08T15:00:47Z fist push on a Spark object for managing persisted RDDs. Not finished yet. commit 4d1d8c90cead1aac4ca61b4510ea533c39b1ad7a Author: Marko A. Rodriguez <okrammarko@gmail.com> Date: 2015-12-08T15:29:26Z Merge branch 'master' into TINKERPOP-1023 commit f8fabe20108f4cec8f4c50c7f7bf6523c112acac Author: Marko A. Rodriguez <okrammarko@gmail.com> Date: 2015-12-08T16:28:19Z added Spark persited RDD utility that can be spark.ls(), spark.head(), spark.rm(), spark.describe(), etc. in the Console. Really cool. Added a SparkGremlinPluginTest that verifies everything works as expected. Updated docs explaining the new tool. commit debea174494d9c22fbcfca1cd505328b9a998e08 Author: Marko A. Rodriguez <okrammarko@gmail.com> Date: 2015-12-08T17:12:01Z added spark RDD utility to docs. commit 9be5a6d35e023e921017ecb44b80d767095f916a Author: Marko A. Rodriguez <okrammarko@gmail.com> Date: 2015-12-08T17:16:51Z minor section rename. commit 97828f10550dff00fdb4474d2b36bff30472182c Author: Marko A. Rodriguez <okrammarko@gmail.com> Date: 2015-12-08T17:24:39Z removed debugging work in SparkTest.
        Hide
        okram Marko A. Rodriguez added a comment -

        This is now possible once https://issues.apache.org/jira/browse/TINKERPOP-1027 is merged. I copied the model that Spark JobServer uses to avoid garbage collection of persisted RDDs. Once https://issues.apache.org/jira/browse/TINKERPOP-1027 is merged, I will create a spark.ls(), spark.rmr(), etc. style console helper.

        Show
        okram Marko A. Rodriguez added a comment - This is now possible once https://issues.apache.org/jira/browse/TINKERPOP-1027 is merged. I copied the model that Spark JobServer uses to avoid garbage collection of persisted RDDs. Once https://issues.apache.org/jira/browse/TINKERPOP-1027 is merged, I will create a spark.ls() , spark.rmr() , etc. style console helper.
        Hide
        okram Marko A. Rodriguez added a comment -

        I tried... its crazy. Sometimes RDDs are persisted by Spark, sometimes not. Sometimes they ARE persisted, but then rdd.name() is null................

        Rando-Magillacutty.

        Show
        okram Marko A. Rodriguez added a comment - I tried... its crazy. Sometimes RDDs are persisted by Spark, sometimes not. Sometimes they ARE persisted, but then rdd.name() is null................ Rando-Magillacutty.
        Hide
        okram Marko A. Rodriguez added a comment -

        Note that hdfs is possible at Gremlin Console startup cause it reads the core-site.xml. We will need something analogous for Spark. Else, someone can simply do:

        gremlin> spark = SparkContext.getOrCreate(configuration)
        

        It would be nice if we provided that preloaded all pretty like though.

        Show
        okram Marko A. Rodriguez added a comment - Note that hdfs is possible at Gremlin Console startup cause it reads the core-site.xml . We will need something analogous for Spark. Else, someone can simply do: gremlin> spark = SparkContext.getOrCreate(configuration) It would be nice if we provided that preloaded all pretty like though.

          People

          • Assignee:
            okram Marko A. Rodriguez
            Reporter:
            okram Marko A. Rodriguez
          • Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development