Uploaded image for project: 'TinkerPop'
  1. TinkerPop
  2. TINKERPOP-1209

ComparatorHolder should returns a Pair<Traversal,Comparator>.

    Details

    • Type: Improvement
    • Status: Closed
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: 3.1.1-incubating
    • Fix Version/s: 3.2.0-incubating
    • Component/s: process
    • Labels:

      Description

      Right now ComparatorHolder has a method:

      List<Comparator> getComparators()
      

      This should really be:

      List<Pair<Traversal<?,E>,Comparator<E>>> getComparators()
      

      By doing this, we will be able to order during the Memory-reduction in Gremlin OLAP. We will be able to create values that look like this:

      [[32, "marko"], v[1]]
      [[12, "stephen"], v[7]]
      [[67, "daniel"], v[8]]
      ...
      

      Then there will be an OrderBiOperator that will have a List<Compartor> that, for the example above, is size 2. It will then be able to use the already computed traversal ends to sort the vertices.

        Activity

        Hide
        githubbot ASF GitHub Bot added a comment -

        Github user asfgit closed the pull request at:

        https://github.com/apache/incubator-tinkerpop/pull/264

        Show
        githubbot ASF GitHub Bot added a comment - Github user asfgit closed the pull request at: https://github.com/apache/incubator-tinkerpop/pull/264
        Hide
        githubbot ASF GitHub Bot added a comment -

        Github user spmallette commented on the pull request:

        https://github.com/apache/incubator-tinkerpop/pull/264#issuecomment-197825464

        VOTE +1

        Show
        githubbot ASF GitHub Bot added a comment - Github user spmallette commented on the pull request: https://github.com/apache/incubator-tinkerpop/pull/264#issuecomment-197825464 VOTE +1
        Hide
        githubbot ASF GitHub Bot added a comment -

        Github user dkuppitz commented on the pull request:

        https://github.com/apache/incubator-tinkerpop/pull/264#issuecomment-197348251

        VOTE: +1

        Show
        githubbot ASF GitHub Bot added a comment - Github user dkuppitz commented on the pull request: https://github.com/apache/incubator-tinkerpop/pull/264#issuecomment-197348251 VOTE: +1
        Hide
        githubbot ASF GitHub Bot added a comment -

        Github user okram commented on the pull request:

        https://github.com/apache/incubator-tinkerpop/pull/264#issuecomment-197092220

        ```
        [INFO] ------------------------------------------------------------------------
        [INFO] Reactor Summary:
        [INFO]
        [INFO] Apache TinkerPop .................................. SUCCESS [3.428s]
        [INFO] Apache TinkerPop :: Gremlin Shaded ................ SUCCESS [2.163s]
        [INFO] Apache TinkerPop :: Gremlin Core .................. SUCCESS [40.227s]
        [INFO] Apache TinkerPop :: Gremlin Test .................. SUCCESS [12.518s]
        [INFO] Apache TinkerPop :: Gremlin Groovy ................ SUCCESS [41.515s]
        [INFO] Apache TinkerPop :: Gremlin Groovy Test ........... SUCCESS [6.202s]
        [INFO] Apache TinkerPop :: TinkerGraph Gremlin ........... SUCCESS [3:26.425s]
        [INFO] Apache TinkerPop :: Hadoop Gremlin ................ SUCCESS [5:22.333s]
        [INFO] Apache TinkerPop :: Spark Gremlin ................. SUCCESS [7:21.764s]
        [INFO] Apache TinkerPop :: Giraph Gremlin ................ SUCCESS [2:23:59.958s]
        [INFO] Apache TinkerPop :: Neo4j Gremlin ................. SUCCESS [18:13.727s]
        [INFO] Apache TinkerPop :: Gremlin Driver ................ SUCCESS [8.994s]
        [INFO] Apache TinkerPop :: Gremlin Server ................ SUCCESS [11:57.023s]
        [INFO] Apache TinkerPop :: Gremlin Console ............... SUCCESS [1:23.478s]
        [INFO] Apache TinkerPop :: Gremlin Archetype ............. SUCCESS [0.100s]
        [INFO] Apache TinkerPop :: Archetype - TinkerGraph ....... SUCCESS [6.349s]
        [INFO] Apache TinkerPop :: Archetype - Server ............ SUCCESS [10.460s]
        [INFO] ------------------------------------------------------------------------
        [INFO] BUILD SUCCESS
        [INFO] ------------------------------------------------------------------------
        [INFO] Total time: 3:13:57.145s
        [INFO] Finished at: Tue Mar 15 18:33:01 MDT 2016
        [INFO] Final Memory: 99M/836M
        [INFO] ------------------------------------------------------------------------
        ```

        Show
        githubbot ASF GitHub Bot added a comment - Github user okram commented on the pull request: https://github.com/apache/incubator-tinkerpop/pull/264#issuecomment-197092220 ``` [INFO] ------------------------------------------------------------------------ [INFO] Reactor Summary: [INFO] [INFO] Apache TinkerPop .................................. SUCCESS [3.428s] [INFO] Apache TinkerPop :: Gremlin Shaded ................ SUCCESS [2.163s] [INFO] Apache TinkerPop :: Gremlin Core .................. SUCCESS [40.227s] [INFO] Apache TinkerPop :: Gremlin Test .................. SUCCESS [12.518s] [INFO] Apache TinkerPop :: Gremlin Groovy ................ SUCCESS [41.515s] [INFO] Apache TinkerPop :: Gremlin Groovy Test ........... SUCCESS [6.202s] [INFO] Apache TinkerPop :: TinkerGraph Gremlin ........... SUCCESS [3:26.425s] [INFO] Apache TinkerPop :: Hadoop Gremlin ................ SUCCESS [5:22.333s] [INFO] Apache TinkerPop :: Spark Gremlin ................. SUCCESS [7:21.764s] [INFO] Apache TinkerPop :: Giraph Gremlin ................ SUCCESS [2:23:59.958s] [INFO] Apache TinkerPop :: Neo4j Gremlin ................. SUCCESS [18:13.727s] [INFO] Apache TinkerPop :: Gremlin Driver ................ SUCCESS [8.994s] [INFO] Apache TinkerPop :: Gremlin Server ................ SUCCESS [11:57.023s] [INFO] Apache TinkerPop :: Gremlin Console ............... SUCCESS [1:23.478s] [INFO] Apache TinkerPop :: Gremlin Archetype ............. SUCCESS [0.100s] [INFO] Apache TinkerPop :: Archetype - TinkerGraph ....... SUCCESS [6.349s] [INFO] Apache TinkerPop :: Archetype - Server ............ SUCCESS [10.460s] [INFO] ------------------------------------------------------------------------ [INFO] BUILD SUCCESS [INFO] ------------------------------------------------------------------------ [INFO] Total time: 3:13:57.145s [INFO] Finished at: Tue Mar 15 18:33:01 MDT 2016 [INFO] Final Memory: 99M/836M [INFO] ------------------------------------------------------------------------ ```
        Hide
        githubbot ASF GitHub Bot added a comment -

        GitHub user okram opened a pull request:

        https://github.com/apache/incubator-tinkerpop/pull/264

        TINKERPOP-1209 & TINKERPOP-1210: OrderXXXStep Updates

        https://issues.apache.org/jira/browse/TINKERPOP-1209
        https://issues.apache.org/jira/browse/TINKERPOP-1210

        In OLAP, if you have a pattern like `order()..limit`, `OrderLimitStrategy` will make it so that each partitioned order (split across workers) orders and limits prior to merging. This greatly reduces the amount of data reaching the master traversal as `order()...limit()` is a common traversal pattern in OLAP.

        CHANGELOG

        ```

        • Fixed an hash code bug in `OrderGlobalStep` and `OrderLocalStep`.
        • Added `OrderLimitStrategy` which will ensure that partitions are limited before being merged in OLAP.
        • `ComparatorHolder` now separates the traversal from the comparator. (breaking)
          ```

        UPGRADE

        ```
        ComparatorHolder API Change
        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

        Providers that either have their own `ComparatorHolder` implementation or reason on `OrderXXXStep` will need to update their code. `ComparatorHolder` now returns `List<Pair<Traversal,Comparator>>`. This has greatly reduced the complexity of comparison-based steps like `OrderXXXStep`. However, its a breaking API change that is trivial to update to, just some awareness is required.
        ```

        VOTE +1.

        You can merge this pull request into a Git repository by running:

        $ git pull https://github.com/apache/incubator-tinkerpop TINKERPOP-1209

        Alternatively you can review and apply these changes as the patch at:

        https://github.com/apache/incubator-tinkerpop/pull/264.patch

        To close this pull request, make a commit to your master/trunk branch
        with (at least) the following in the commit message:

        This closes #264


        commit 813ba24da510b3770988d62916c2016265323129
        Author: Marko A. Rodriguez <okrammarko@gmail.com>
        Date: 2016-03-15T16:10:26Z

        ComparatorHolder now has getComprators() return a List of Pair<Traversal,Comparator>. This was what we needed and this allowed me to gut lots of code and its so much more intutitive and will make it so pre-order/limits will be possible in OLAP. Unfortunately, this is breaking for vendors that reason (or have) a OrderXXXStep. The update is trivial and in fact, a lot less if/else for them.

        commit 3ee1548a2b9c0b1c1e00d13a024a388965f5e846
        Author: Marko A. Rodriguez <okrammarko@gmail.com>
        Date: 2016-03-15T16:12:59Z

        TraversalComparator is gutted – check out that if/else nest that we no longer have to propagate through. Thank the heavens.

        commit a67567e0ce627a5fa313d4be0a0f5b9f6ad65584
        Author: Marko A. Rodriguez <okrammarko@gmail.com>
        Date: 2016-03-15T17:57:54Z

        added OrderLimitStrategy which finds order()...limit patterns. It then tells OrderStep to order-then-limit. This is a potentially massive optimization in OLAP where if you do order().limit(5), the max number of traversers coming to the master traversal, is 5 * numberOfWorkers instead of the full set of traversers. Added OrderBiOperator which is a Memory reducer which handles this in OLAP. Added test cases to make this pretty. Added this as a default strategy in the GlobalCache. Currently OrderLimitStrategy is only for OLAP – we could make it for OLTP, but we would have to write our own custom Collections.sort() that has a size limit.

        commit dc0348717115a3572b5b70ef1d8f969c505c2bbf
        Author: Marko A. Rodriguez <okrammarko@gmail.com>
        Date: 2016-03-15T20:18:43Z

        added more test cases. Fixed a old equality issue in OrderGlobalStepTest and OrderLocalStepTest cc/ @dkuppitz. Added more test cases to ensure OrderLimitStrategy is behaving correctly. OrderBiOperator now uses JavaSerializer so Giraph and Spark are happy. I think this is good to go. Perhaps one more test case using GratefulDead graph would be good.


        Show
        githubbot ASF GitHub Bot added a comment - GitHub user okram opened a pull request: https://github.com/apache/incubator-tinkerpop/pull/264 TINKERPOP-1209 & TINKERPOP-1210 : OrderXXXStep Updates https://issues.apache.org/jira/browse/TINKERPOP-1209 https://issues.apache.org/jira/browse/TINKERPOP-1210 In OLAP, if you have a pattern like `order()..limit `, `OrderLimitStrategy` will make it so that each partitioned order (split across workers) orders and limits prior to merging. This greatly reduces the amount of data reaching the master traversal as `order()...limit()` is a common traversal pattern in OLAP. CHANGELOG ``` Fixed an hash code bug in `OrderGlobalStep` and `OrderLocalStep`. Added `OrderLimitStrategy` which will ensure that partitions are limited before being merged in OLAP. `ComparatorHolder` now separates the traversal from the comparator. ( breaking ) ``` UPGRADE ``` ComparatorHolder API Change ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ Providers that either have their own `ComparatorHolder` implementation or reason on `OrderXXXStep` will need to update their code. `ComparatorHolder` now returns `List<Pair<Traversal,Comparator>>`. This has greatly reduced the complexity of comparison-based steps like `OrderXXXStep`. However, its a breaking API change that is trivial to update to, just some awareness is required. ``` VOTE +1. You can merge this pull request into a Git repository by running: $ git pull https://github.com/apache/incubator-tinkerpop TINKERPOP-1209 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/incubator-tinkerpop/pull/264.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #264 commit 813ba24da510b3770988d62916c2016265323129 Author: Marko A. Rodriguez <okrammarko@gmail.com> Date: 2016-03-15T16:10:26Z ComparatorHolder now has getComprators() return a List of Pair<Traversal,Comparator>. This was what we needed and this allowed me to gut lots of code and its so much more intutitive and will make it so pre-order/limits will be possible in OLAP. Unfortunately, this is breaking for vendors that reason (or have) a OrderXXXStep. The update is trivial and in fact, a lot less if/else for them. commit 3ee1548a2b9c0b1c1e00d13a024a388965f5e846 Author: Marko A. Rodriguez <okrammarko@gmail.com> Date: 2016-03-15T16:12:59Z TraversalComparator is gutted – check out that if/else nest that we no longer have to propagate through. Thank the heavens. commit a67567e0ce627a5fa313d4be0a0f5b9f6ad65584 Author: Marko A. Rodriguez <okrammarko@gmail.com> Date: 2016-03-15T17:57:54Z added OrderLimitStrategy which finds order()...limit patterns. It then tells OrderStep to order-then-limit. This is a potentially massive optimization in OLAP where if you do order().limit(5), the max number of traversers coming to the master traversal, is 5 * numberOfWorkers instead of the full set of traversers. Added OrderBiOperator which is a Memory reducer which handles this in OLAP. Added test cases to make this pretty. Added this as a default strategy in the GlobalCache. Currently OrderLimitStrategy is only for OLAP – we could make it for OLTP, but we would have to write our own custom Collections.sort() that has a size limit. commit dc0348717115a3572b5b70ef1d8f969c505c2bbf Author: Marko A. Rodriguez <okrammarko@gmail.com> Date: 2016-03-15T20:18:43Z added more test cases. Fixed a old equality issue in OrderGlobalStepTest and OrderLocalStepTest cc/ @dkuppitz. Added more test cases to ensure OrderLimitStrategy is behaving correctly. OrderBiOperator now uses JavaSerializer so Giraph and Spark are happy. I think this is good to go. Perhaps one more test case using GratefulDead graph would be good.

          People

          • Assignee:
            okram Marko A. Rodriguez
            Reporter:
            okram Marko A. Rodriguez
          • Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development