Uploaded image for project: 'TinkerPop'
  1. TinkerPop
  2. TINKERPOP-1027

Merge view prior to writing graphRDD to output format/rdd

    Details

    • Type: Bug
    • Status: Closed
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: 3.1.0-incubating
    • Fix Version/s: 3.1.1-incubating
    • Component/s: hadoop
    • Labels:
      None

      Description

      Dan LaRocque noted that DSEGraph was not happy with the current graphRDD model when it comes to writing. To make it happy, the view merge needs to happen prior to graphRDD output. Thus, move the mapReduceRDD view merge to before graphRDD writing.

        Activity

        Hide
        okram Marko A. Rodriguez added a comment -

        Related tangentially to TINKERPOP-1025

        Show
        okram Marko A. Rodriguez added a comment - Related tangentially to TINKERPOP-1025
        Hide
        okram Marko A. Rodriguez added a comment -

        This has the added benefit that reduceByKey() is no longer needed from InputRDDFormat nor InputRDD. That is huge as it greatly simplifies the implementation of formats and RDDs for users.

        Show
        okram Marko A. Rodriguez added a comment - This has the added benefit that reduceByKey() is no longer needed from InputRDDFormat nor InputRDD . That is huge as it greatly simplifies the implementation of formats and RDDs for users.
        Hide
        githubbot ASF GitHub Bot added a comment -

        GitHub user okram opened a pull request:

        https://github.com/apache/incubator-tinkerpop/pull/172

        TINKERPOP-1027: Merge view prior to writing graphRDD to output format/rdd

        https://issues.apache.org/jira/browse/TINKERPOP-1027

        We had a bug in Spark `graphRDD` writing that showed itself on for particular providers. @dalaro provided realized the problem and provided a solution. This PR implements @dalaro's recommended fix. This fix also removes the need for `reduceByKey()` (though backwards compatible if you do still have it) and allowed us to always use `GryoSerialization` with Spark. This is rad. I added a few more required serialization registers to `GryoSerialization` and all the test cases pass. I also added some more test cases to ensure proper functioning.

        • Spark integration tests passed.
        • `mvn clean install` passed.

        VOTE +1.

        You can merge this pull request into a Git repository by running:

        $ git pull https://github.com/apache/incubator-tinkerpop TINKERPOP-1027

        Alternatively you can review and apply these changes as the patch at:

        https://github.com/apache/incubator-tinkerpop/pull/172.patch

        To close this pull request, make a commit to your master/trunk branch
        with (at least) the following in the commit message:

        This closes #172


        commit 5c7bc38bdb42ae50243f58a22fc74bc094be6333
        Author: Marko A. Rodriguez <okrammarko@gmail.com>
        Date: 2015-12-04T15:36:08Z

        mapReduceRDD makes use of a post view merge. @dalaro realized this was important prior to graph writing. Thus, moved the view merge to pre-mapreduce and pre-graph output. Added more rigorous property checking to PageRankVertexProgramTest. InputFormatRDD and ToyGraphInputRDD no longer require reduceByKey() initiation because of merged veiws.

        commit e45c293425ed4d9c317b5efbb3a81a9874f7e0e6
        Author: Marko A. Rodriguez <okrammarko@gmail.com>
        Date: 2015-12-04T18:14:50Z

        numerous tweaks trying to get things clean and clear. Added more tests to PersistedInputOutputRDDTest that show good long chain vertex programs with various degrees of Persist and Hadoop OLTP access, etc. Looking good. Still BulkLoaderVertexProgram problem with InputRDD... don't know what the problem is still (unfortunately).

        commit 42bcd89d7cd3d297d958ad22919377e94a149b0e
        Author: Marko A. Rodriguez <okrammarko@gmail.com>
        Date: 2015-12-07T18:14:29Z

        Merge branch 'TINKERPOP-1025' into TINKERPOP-1027


        Show
        githubbot ASF GitHub Bot added a comment - GitHub user okram opened a pull request: https://github.com/apache/incubator-tinkerpop/pull/172 TINKERPOP-1027 : Merge view prior to writing graphRDD to output format/rdd https://issues.apache.org/jira/browse/TINKERPOP-1027 We had a bug in Spark `graphRDD` writing that showed itself on for particular providers. @dalaro provided realized the problem and provided a solution. This PR implements @dalaro's recommended fix. This fix also removes the need for `reduceByKey()` (though backwards compatible if you do still have it) and allowed us to always use `GryoSerialization` with Spark. This is rad. I added a few more required serialization registers to `GryoSerialization` and all the test cases pass. I also added some more test cases to ensure proper functioning. Spark integration tests passed. `mvn clean install` passed. VOTE +1. You can merge this pull request into a Git repository by running: $ git pull https://github.com/apache/incubator-tinkerpop TINKERPOP-1027 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/incubator-tinkerpop/pull/172.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #172 commit 5c7bc38bdb42ae50243f58a22fc74bc094be6333 Author: Marko A. Rodriguez <okrammarko@gmail.com> Date: 2015-12-04T15:36:08Z mapReduceRDD makes use of a post view merge. @dalaro realized this was important prior to graph writing. Thus, moved the view merge to pre-mapreduce and pre-graph output. Added more rigorous property checking to PageRankVertexProgramTest. InputFormatRDD and ToyGraphInputRDD no longer require reduceByKey() initiation because of merged veiws. commit e45c293425ed4d9c317b5efbb3a81a9874f7e0e6 Author: Marko A. Rodriguez <okrammarko@gmail.com> Date: 2015-12-04T18:14:50Z numerous tweaks trying to get things clean and clear. Added more tests to PersistedInputOutputRDDTest that show good long chain vertex programs with various degrees of Persist and Hadoop OLTP access, etc. Looking good. Still BulkLoaderVertexProgram problem with InputRDD... don't know what the problem is still (unfortunately). commit 42bcd89d7cd3d297d958ad22919377e94a149b0e Author: Marko A. Rodriguez <okrammarko@gmail.com> Date: 2015-12-07T18:14:29Z Merge branch ' TINKERPOP-1025 ' into TINKERPOP-1027
        Hide
        githubbot ASF GitHub Bot added a comment -

        Github user okram commented on the pull request:

        https://github.com/apache/incubator-tinkerpop/pull/172#issuecomment-162616761

        NOTE: I originally had this work in TINKERPOP-1025, but that ticket is a completely different beast so I merged the work into a new ticket. TINKERPOP-1025 is still being dealt with.

        Show
        githubbot ASF GitHub Bot added a comment - Github user okram commented on the pull request: https://github.com/apache/incubator-tinkerpop/pull/172#issuecomment-162616761 NOTE: I originally had this work in TINKERPOP-1025 , but that ticket is a completely different beast so I merged the work into a new ticket. TINKERPOP-1025 is still being dealt with.
        Hide
        githubbot ASF GitHub Bot added a comment -

        Github user dalaro commented on the pull request:

        https://github.com/apache/incubator-tinkerpop/pull/172#issuecomment-162618167

        [My comment on TINKERPOP-1025](https://issues.apache.org/jira/browse/TINKERPOP-1025?focusedCommentId=15045307&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-15045307) now applies to this PR. The change involving prepareFinalGraphRDD is what matters to me, and when I tested a TINKERPOP-1025 HEAD that includes both of these commits, it solved my problem. So, +1 (non-voting) from me.

        Show
        githubbot ASF GitHub Bot added a comment - Github user dalaro commented on the pull request: https://github.com/apache/incubator-tinkerpop/pull/172#issuecomment-162618167 [My comment on TINKERPOP-1025] ( https://issues.apache.org/jira/browse/TINKERPOP-1025?focusedCommentId=15045307&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-15045307 ) now applies to this PR. The change involving prepareFinalGraphRDD is what matters to me, and when I tested a TINKERPOP-1025 HEAD that includes both of these commits, it solved my problem. So, +1 (non-voting) from me.
        Hide
        githubbot ASF GitHub Bot added a comment -

        Github user okram commented on the pull request:

        https://github.com/apache/incubator-tinkerpop/pull/172#issuecomment-162709345

        I made it so that SparkGremlin works like Spark JobServer (https://github.com/spark-jobserver/spark-jobserver/). It ensures that RDDs are not garbage collected by maintaining a static `Spark` class that holds a `ConcurrentHashMap` of RDDs. Thus, Spark is like a "file system" in that RDDs can be `ls()`, `rm()`, etc. This was necessary to get "slow" `mvn clean install` to build correctly where RDDs are NOT GC'd by Spark context cleaner.

        Show
        githubbot ASF GitHub Bot added a comment - Github user okram commented on the pull request: https://github.com/apache/incubator-tinkerpop/pull/172#issuecomment-162709345 I made it so that SparkGremlin works like Spark JobServer ( https://github.com/spark-jobserver/spark-jobserver/ ). It ensures that RDDs are not garbage collected by maintaining a static `Spark` class that holds a `ConcurrentHashMap` of RDDs. Thus, Spark is like a "file system" in that RDDs can be `ls()`, `rm()`, etc. This was necessary to get "slow" `mvn clean install` to build correctly where RDDs are NOT GC'd by Spark context cleaner.
        Hide
        githubbot ASF GitHub Bot added a comment -

        Github user twilmes commented on the pull request:

        https://github.com/apache/incubator-tinkerpop/pull/172#issuecomment-162737584

        Awesome, looks good and tests pass for me.

        VOTE +1

        Show
        githubbot ASF GitHub Bot added a comment - Github user twilmes commented on the pull request: https://github.com/apache/incubator-tinkerpop/pull/172#issuecomment-162737584 Awesome, looks good and tests pass for me. VOTE +1
        Hide
        githubbot ASF GitHub Bot added a comment -

        Github user ds-jenkins-builds commented on the pull request:

        https://github.com/apache/incubator-tinkerpop/pull/172#issuecomment-162858822

        Build finished. No test results found.

        Show
        githubbot ASF GitHub Bot added a comment - Github user ds-jenkins-builds commented on the pull request: https://github.com/apache/incubator-tinkerpop/pull/172#issuecomment-162858822 Build finished. No test results found.
        Hide
        githubbot ASF GitHub Bot added a comment -

        Github user spmallette commented on the pull request:

        https://github.com/apache/incubator-tinkerpop/pull/172#issuecomment-162867159

        tests run locally for me:

        VOTE +1

        btw, your intellij settings might need to get fixed up - they are doing wildcards for static imports:

        https://github.com/apache/incubator-tinkerpop/pull/172/files#diff-889af44a6f21b3700e537cc41765435aR40

        Show
        githubbot ASF GitHub Bot added a comment - Github user spmallette commented on the pull request: https://github.com/apache/incubator-tinkerpop/pull/172#issuecomment-162867159 tests run locally for me: VOTE +1 btw, your intellij settings might need to get fixed up - they are doing wildcards for static imports: https://github.com/apache/incubator-tinkerpop/pull/172/files#diff-889af44a6f21b3700e537cc41765435aR40
        Hide
        githubbot ASF GitHub Bot added a comment -

        Github user asfgit closed the pull request at:

        https://github.com/apache/incubator-tinkerpop/pull/172

        Show
        githubbot ASF GitHub Bot added a comment - Github user asfgit closed the pull request at: https://github.com/apache/incubator-tinkerpop/pull/172

          People

          • Assignee:
            okram Marko A. Rodriguez
            Reporter:
            okram Marko A. Rodriguez
          • Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development