Uploaded image for project: 'TinkerPop'
  1. TinkerPop
  2. TINKERPOP-1192

TraversalSideEffects should support registered reducers (binary operators).

    Details

    • Type: Improvement
    • Status: Closed
    • Priority: Major
    • Resolution: Implemented
    • Affects Version/s: 3.1.1-incubating
    • Fix Version/s: 3.2.0-incubating
    • Component/s: process
    • Labels:
      None

      Description

      TinkerPop 3.2.0-SNAPSHOT has made leaps and bounds towards completely aligning Gremlin OLTP and Gremlin OLAP. What has got me really excited is that there is such a strong conceptual alignment between the following components:

      	VertexProgram <=> Traversal
      	Iteration <=> Step
      	Messages <=> Traversers
      	MessageCombiner <=> TraverserSet ("bulking")
      	BSP <=> Barrier
      	Workers <=> Parallel Steps
      	Master <=> Sequential Steps
      	Memory <=> SideEffects
      

      TraversalVertexProgram is very clean – its lays atop the GraphComputer API in a natural, effortless way.

      However, there is one last pairing that needs some better alignment: GraphComputer Memory and Traversal SideEffects. A Memory slot has the notion of a key, a value, and a reducer (binary operator). A Traversal SideEffect as the notion of a key and a value. I think we should enable Traversal SideEffects to support registered reducers. If we do this, then there is perfect alignment between the two models and we won't have to have "if(onGraphComputer)"-type logic in our side-effect steps.

      Right now in GroupCountSideEffectStep we do this:

      public void sideEffect(final Traverser<S> traverser) {
        Map<E,Long> groupCountMap = this.getTraversal().getSideEffects().get(this.sideEffectKey);
        MapHelper.incr(traverser.get(), traverser.bulk(), groupCountMap)
      }
      

      We are explicitly getting the Map from the sideEffects and updating it. This model will not generally work in OLAP because groupCountMap is a distributed data structure and thus, local updates to a Map don't distribute. I have it working currently in master/, but at the cost of not being able to read the sideEffect, only write to it. To make TraversalSideEffects consistent across both OLTP and OLAP, I think we should express GroupCountSideEffectStep like this (*** analogously for GroupSideEffectStep, TreeSideEffectStep, etc.):

      public void sideEffect(final Traverser<S> traverser) {
        this.getTraversal().getSideEffects().add(this.sideEffectKey, Collections.singletonMap(traverser.get(), traverser.bulk());
      }
      

      Moreover, TraversalSideEffects should have the following method:

      	TraversalSideEffects.register(final String key, final Supplier<A> initialValue, final BinaryOperator<A> reducer)
      

      Note that we already have:

      https://github.com/apache/incubator-tinkerpop/blob/master/gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/process/traversal/TraversalSideEffects.java#L88

      We can deprecate the current registerSupplier() in support of register(). Moreover, for backwards compatibility, BinaryOperator<A> reducer would simply be Operator.assign.
      https://github.com/apache/incubator-tinkerpop/blob/master/gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/process/traversal/Operator.java#L59-L62
      Thus, this would not be a breaking change and it will ensure a natural congruence between these two related computing structures – Memory and TraversalSideEffects.

        Activity

        Hide
        githubbot ASF GitHub Bot added a comment -

        GitHub user okram opened a pull request:

        https://github.com/apache/incubator-tinkerpop/pull/251

        TINKERPOP-1192 & TINKERPOP-1193 & TINKERPOP-951 & TINKERPOP-958: The future is now.

        https://issues.apache.org/jira/browse/TINKERPOP-1192
        https://issues.apache.org/jira/browse/TINKERPOP-1193
        https://issues.apache.org/jira/browse/TINKERPOP-951
        https://issues.apache.org/jira/browse/TINKERPOP-958

        Summary: `TraversalSideEffects` and `GraphComputer` `Memory` are aligned. The are unified via `MemoryTraversalSideEffects` which gives us the flexibility to have a single `TraversalSideEffects` across an arbitrary number of OLAP jobs and finally makes it so that OLTP and OLAP use of sideEffects are consistent. This comes at the cost of a few API changes to `TraversalSideEffects` and some semantics changes. For 99% of users, they won't notice. Only those that make heavy use of `sideEffect

        {x = ...}

        ` may have to change their usage.

        ```
        g.V().out("created").
        group("m").by(label).
        pageRank(1.0).
        by("pageRank").
        by(inE()).
        times(1).
        in("created").
        group("m").by("pageRank").
        cap("m")
        ```

        The result is:

        [

        {2.0=[v[4], v[4], v[4], v[4]], 1.0=[v[6], v[6], v[6], v[1], v[1], v[1]], software=[v[5], v[3], v[3], v[3]]}

        ]

        Do you see why this traversal is so cool? Check out what it is doing —

        1. Normal out("created").group(). Fine and dandy all the software vertices are grouped.
        2. But then we kick off another OLAP job that computes a single-step spreading activation starting at the software vertices.
        3. Then, from the software vertices, it goes to the people who created that software and groups those people by their pageRank value.

        First, (already known), this traversal compiles to 3 OLAP jobs and it "just works." Slammin'.
        Second, as of this afternoon, TraversalSideEffects (GraphComputer.Memory) are preserved across OLAP jobs and across TraversalVertexPrograms.
        — it truly is just one traversal.

        ------

        CHANGELOG

        ```

        • Fixed a severe `Traversal` cloning issue that caused inconsistent `TraversalSideEffects`.
        • `TraversalSideEffects` remain consistent and useable across multiple chained OLAP jobs.
        • Added `MemoryTraversalSideEffects` which backs a `TraversalSideEffects` by `Memory` in OLAP.
        • `TraversalSideEffects` are now fully functional in OLAP save that an accurate global view is possible after an iteration.
        • Updated the `TraversalSideEffects` API to support registered reducers and updated `get()`-semantics. (breaking)
        • Split existing `profile()` into `ProfileStep` and `ProfileSideEffectStep`.
        • The `profile()`-step acts like a reducing barrier and emits `TraversalMetrics` without the need for `cap()`. (breaking)
        • Added `LocalBarrier` to allow traversers to remain distributed during an iteration so as to limit cluster traffic.
        • An OLAP-based `Barrier` synchronization bug has been fixed.
          ```

        UPDATE

        ```

        TraversalSideEffect Update
        ^^^^^^^^^^^^^^^^^^^^^^^^^^

        There were significant changes to `TraversalSideEffect` both at the semantic level and at the API level. Users that have traversals of the form `sideEffect

        {…}

        ` that use global side-effects should read the following carefully. If the user only uses side-effects via non-lambda steps (e.g. `groupCount("m")`) and with `g.withSideEffects()`, then the changes below will not effect them. Providers will not be effected by the changes save any tests cases that use `Traversal.getSideEffects().get()`. This method no longer returns `Optional<V>`, but instead just the value `<V>`.

        TraversalSideEffect Get API Change
        ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

        `TraversalSideEffects` can now logically operate within a distributed OLAP environment. In order to make this possible, it is necessary that each side-effect be registered with a reducing `BinaryOperator`. This binary operator will combine distributed updates into a single global side-effect at the master traversal. Many of the methods in `TraversalSideEffect` have been `Deprecated`, but they are backwards compatible save that `TraversalSideEffects.get()` no longer returns an `Optional`, but instead throws an `IllegalArgumentException`. While the `Optional` semantics could have remained, it was deemed best to directly return the side-effect value to reduce object creation costs and because all side-effects must be registered apriori and therefore there is never a reason why an unknown side-effect key would be used.

        TraversalSideEffect Registration Requirement
        ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

        All `TraversalSideEffects` must be registered upfront. This is because, in OLAP, side-effects map to `Memory` compute keys and as such, must be declared prior to the execution of the `TraversalVertexProgram`. If a user's traversal creates a side-effect mid-traversal, it will fail. The traversal must use `GraphTraversalSource.withSideEffect()` to declare the side-effects it will use during its execution lifetime.

        TraversalSideEffect Add Requirement
        ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

        In a distributed environment, a side-effect can not be mutated and be expected to exist in the mutated form at the final, aggregated, master traversal. For instance, if a side-effect "myCount" references a `Long`, the `Long` can not be updated directly via `sideEffects.set("myCount", sideEffects.get("myCount") + 1)`. Instead, it must rely on the reducer to do the merging and thus, the `Step` must do `sideEffect.add("mySet",1)` where the registered reducer is `Operator.sum`. Note that `Traverser.sideEffect(key,value)` will use `TraversalSideEffect.add()`. Thus, the below will increment "a". If no operator was provided, then the operator is assumed `Operator.assign` and the final result of "a" would be 1.

        `g.withSideEffect("a",0,sum).V().out().sideEffect(t -> t.sideEffect("a",1))`

        See link:https://issues.apache.org/jira/browse/TINKERPOP-1192[TINKERPOP-1192]

        ProfileStep Update and GraphTraversal API Change
        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

        The `profile()`-step has been refactored into 2 steps – `ProfileStep` and `ProfileSideEffectStep`. Users who previously used the `profile()` in conjunction with `cap(TraversalMetrics.METRICS_KEY)` can now simply omit the cap step. Users who retrieved `TraversalMetrics` from the side-effects after iteration can still do so, but will need to specify a side-effect key when using the `profile()`. For example, `profile("myMetrics")`.

        See link:https://issues.apache.org/jira/browse/TINKERPOP-958[TINKERPOP-958]

        ```

        VOTE +1.

        You can merge this pull request into a Git repository by running:

        $ git pull https://github.com/apache/incubator-tinkerpop TINKERPOP-1192

        Alternatively you can review and apply these changes as the patch at:

        https://github.com/apache/incubator-tinkerpop/pull/251.patch

        To close this pull request, make a commit to your master/trunk branch
        with (at least) the following in the commit message:

        This closes #251


        commit 8e807eec73a72b4bbbde3156e6aa8b237d099b42
        Author: rjbriody <bob.briody@datastax.com>
        Date: 2016-02-24T20:08:25Z

        TINKERPOP-958 split .profile() into ProfileSideEffectStep and ProfileStep (which is a MapStep).

        commit 856bb564baad3a28754aa37a77803ab3d2f9b8a5
        Author: rjbriody <bob.briody@datastax.com>
        Date: 2016-02-25T16:07:17Z

        Condense ProfileSideEffectTest into ProfileTest.

        commit 7374ee45cd918f0770df3966476e98fac10794ae
        Author: rjbriody <bob.briody@datastax.com>
        Date: 2016-02-26T14:52:20Z

        Fix computer issues for profile step.

        commit 86941ee5c6ff37f2bc4f88ea520a530e9372ba79
        Author: rjbriody <bob.briody@datastax.com>
        Date: 2016-03-01T13:35:45Z

        Update traversal docs for Profile step changes.

        commit 06cdff86c21a9d2c20f528d9c620604dde1b6a07
        Author: rjbriody <bob.briody@datastax.com>
        Date: 2016-03-01T13:39:34Z

        Update traversal docs for Profile step changes.

        commit 6f6c9a5698373291c51e9b12af86a5dc3e09ff0e
        Author: rjbriody <bob.briody@datastax.com>
        Date: 2016-03-01T13:45:38Z

        Revert "Update traversal docs for Profile step changes."

        This reverts commit 06cdff86c21a9d2c20f528d9c620604dde1b6a07.

        commit ebb6143e9dffbc0980dcb1445a18088f3e7fd4fa
        Author: rjbriody <bob.briody@datastax.com>
        Date: 2016-03-01T13:45:46Z

        Revert "Update traversal docs for Profile step changes."

        This reverts commit 86941ee5c6ff37f2bc4f88ea520a530e9372ba79.

        commit 50c56563157337e71f74a940b9f6c17718860dd3
        Author: rjbriody <bob.briody@datastax.com>
        Date: 2016-03-01T13:49:22Z

        Update traversal docs for Profile step changes.

        commit 7013a9746cc04da32bb63fe0e6932a468c47ff0d
        Author: rjbriody <bob.briody@datastax.com>
        Date: 2016-03-01T13:50:37Z

        merge master

        commit 5f6ab602c2539f12c35c3be607f287dd689375af
        Author: Marko A. Rodriguez <okrammarko@gmail.com>
        Date: 2016-03-02T21:23:20Z

        Created MemoryTraversalSideEffects which backs a TraversalSideEffects by GraphComputer.Memory. Everything works except AggregateStep and ProfileStep. Need more thinking. I found a fundamental bug in Step.clone() where SideEffects in the cloned traversal referenced the original traversal. It had to do with the fast the cloning of a step is like this – step.clone(), step.setTraversal(traversal). As such, for all TraversalParent steps, I had to Override setTraversal(traversal) and it is there where TraversalParent.integrateChild(...) is called. However, once that was done, everything with MemoryTraversalSideEffects just worked (except Aggregate and Profile — of course). This is a really sweet model. Removed VertexTraversalSideEffects which was the old model which stored sideEffects as traverser properties which were then processed and aggregated via MapReduce jobs. With MemoryTraversalSideEffects, the sideEffects are contained within the TraversalVertexProgram job — its so slick (and way more efficient).

        commit b63f62e7008614782bcd3c4ef5a8b4ea7e6eb830
        Author: Marko A. Rodriguez <okrammarko@gmail.com>
        Date: 2016-03-02T23:30:54Z

        through hell and back. TraversalSideEffects are clean and proper. Added LocalBarrier interface which tells GraphComputer not to aggregate the traversers to the master traversal, but instead to keep them distributed across the cluster, but block – that is, barrier. This is a way to force a sychronization without having to move data. Currently only AggregateStep implements LocalBarrier. However, other steps such as NoOpCollectingBarrierStep and SupplyingBarrierStep can use it. All test cases pass except for the ProfileTest stuff. Going to merge in the @rjbriody branch into this branch and get everything smooth and slick and then PR both from this branch.

        commit 08780c4b8ceaa4231b4a8421fea064c0f8122f84
        Author: Marko A. Rodriguez <okrammarko@gmail.com>
        Date: 2016-03-03T14:25:19Z

        Figured out why Groovy traversals were failing. I was overriding the SideEffects in ScriptTraversal. I needed to use the SideEffects from the parent traversal. Commented out the ProfileTests for now. Going to now try and merge @rjbriody work into this branch and then if all is gravy, PR both from here.

        commit e4ea17395478132e7f78bd53a9921ef803ab6a96
        Author: Marko A. Rodriguez <okrammarko@gmail.com>
        Date: 2016-03-03T14:28:44Z

        Merge branch 'TINKERPOP-958' of https://github.com/rjbriody/incubator-tinkerpop into TINKERPOP-1192

        Conflicts:
        gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/process/traversal/step/sideEffect/ProfileSideEffectStep.java
        gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/structure/io/gryo/GryoMapper.java
        gremlin-groovy-test/src/main/java/org/apache/tinkerpop/gremlin/process/GroovyProcessComputerSuite.java
        gremlin-test/src/main/java/org/apache/tinkerpop/gremlin/process/ProcessComputerSuite.java

        commit ff7f7d2b74a42d7c9574a0ebbb8cc4a6897195f5
        Author: Marko A. Rodriguez <okrammarko@gmail.com>
        Date: 2016-03-03T17:03:09Z

        added @rjbriody ProfileStep updates.

        commit 5ec7dd063087dfd8c7cc1894bf86fa82a80a75ba
        Author: Marko A. Rodriguez <okrammarko@gmail.com>
        Date: 2016-03-03T19:11:02Z

        Finalized the TravarsalSideEffect API work. There is one breaking change that was needed – its small. However, the semantics of TraversalSideEffect now require all side-effects to be registered upfront via withSideEffect(). Users can no longer create a side-effect mid-traversal. This ensures that OLTP and OLAP behave identically. This occurs when people did something like — g.V().sideEffect

        {it.sideEffect('a',new Set())}

        .out().out().... that sideEffect will need to be created at g.withSideEffect('a',Set::new).... Likewise. All other methods work but are deprecated and assume Operator.assign if now registered Reducer exists — this will be okay for OLTP, but will lead to bad results in OLAP. In short, peoeople should really register reducers with their sideEffects – g.V.withSideEffect(a,Set::new,Operator.addAll)......Added a pretty insan-o nested, sideEffect usage test to GroupCountTest – OLTP and OLAP are happy.

        commit ff390381b9896a16c73a056f167e84f2beb8f1d6
        Author: Marko A. Rodriguez <okrammarko@gmail.com>
        Date: 2016-03-03T20:07:52Z

        added test to make sure declared SideEffect is ultimate returned class.

        commit f778b5b0ab13c3860c30e39eadd64dcdd7af1c18
        Author: Marko A. Rodriguez <okrammarko@gmail.com>
        Date: 2016-03-03T22:11:40Z

        Added some crazy tests to SideEffectTest which verify that registered operators and suppliers as well as mid-traversal side-effect calls behave as expected in both OLTP and OLAP. This required lots of tinkering with DefaultTraversalSideEffects to get the logic right around registeration. Removed a pointless TraversalSideEffectsTest we had. Doing all the complex logic in SideEffectTest.

        commit 731789637d2b47306bed3a2f3008c9bf063a0f3a
        Author: Marko A. Rodriguez <okrammarko@gmail.com>
        Date: 2016-03-03T23:58:00Z

        fixed a bug in DefaultTraversalSideEffects.mergedInto(). Added THE MOST INSANE test case to PageRankTest. SideEffects are preserved across GraphComputer jobs. Insane.

        commit 02a551a5f544f8f3b2344631252c21c03a51f117
        Author: Marko A. Rodriguez <okrammarko@gmail.com>
        Date: 2016-03-04T02:14:18Z

        added some nice test cases that ensure cloning and compiling respect TraversalSideEffect propagation. Added test cases that verify that the new TraversalSideEffect API changes are correctly implemented by DefaultTraversalSideEffect. All in all, just more tests cases to give us confidence.

        commit 1c174ba5e5dfa025b41df541ba21b3637e17901e
        Author: Marko A. Rodriguez <okrammarko@gmail.com>
        Date: 2016-03-04T02:31:34Z

        realized a very simple (and wildly efficient) optimization to TraversalVertexProgram around when to and not to return halted traversers to the master traversal. Okay. Done for the night.


        Show
        githubbot ASF GitHub Bot added a comment - GitHub user okram opened a pull request: https://github.com/apache/incubator-tinkerpop/pull/251 TINKERPOP-1192 & TINKERPOP-1193 & TINKERPOP-951 & TINKERPOP-958 : The future is now. https://issues.apache.org/jira/browse/TINKERPOP-1192 https://issues.apache.org/jira/browse/TINKERPOP-1193 https://issues.apache.org/jira/browse/TINKERPOP-951 https://issues.apache.org/jira/browse/TINKERPOP-958 Summary: `TraversalSideEffects` and `GraphComputer` `Memory` are aligned. The are unified via `MemoryTraversalSideEffects` which gives us the flexibility to have a single `TraversalSideEffects` across an arbitrary number of OLAP jobs and finally makes it so that OLTP and OLAP use of sideEffects are consistent. This comes at the cost of a few API changes to `TraversalSideEffects` and some semantics changes. For 99% of users, they won't notice. Only those that make heavy use of `sideEffect {x = ...} ` may have to change their usage. ``` g.V().out("created"). group("m").by(label). pageRank(1.0). by("pageRank"). by(inE()). times(1). in("created"). group("m").by("pageRank"). cap("m") ``` The result is: [ {2.0=[v[4], v[4], v[4], v[4]], 1.0=[v[6], v[6], v[6], v[1], v[1], v[1]], software=[v[5], v[3], v[3], v[3]]} ] Do you see why this traversal is so cool? Check out what it is doing — 1. Normal out("created").group(). Fine and dandy all the software vertices are grouped. 2. But then we kick off another OLAP job that computes a single-step spreading activation starting at the software vertices. 3. Then, from the software vertices, it goes to the people who created that software and groups those people by their pageRank value. First, (already known), this traversal compiles to 3 OLAP jobs and it "just works." Slammin'. Second, as of this afternoon, TraversalSideEffects (GraphComputer.Memory) are preserved across OLAP jobs and across TraversalVertexPrograms. — it truly is just one traversal. ------ CHANGELOG ``` Fixed a severe `Traversal` cloning issue that caused inconsistent `TraversalSideEffects`. `TraversalSideEffects` remain consistent and useable across multiple chained OLAP jobs. Added `MemoryTraversalSideEffects` which backs a `TraversalSideEffects` by `Memory` in OLAP. `TraversalSideEffects` are now fully functional in OLAP save that an accurate global view is possible after an iteration. Updated the `TraversalSideEffects` API to support registered reducers and updated `get()`-semantics. ( breaking ) Split existing `profile()` into `ProfileStep` and `ProfileSideEffectStep`. The `profile()`-step acts like a reducing barrier and emits `TraversalMetrics` without the need for `cap()`. ( breaking ) Added `LocalBarrier` to allow traversers to remain distributed during an iteration so as to limit cluster traffic. An OLAP-based `Barrier` synchronization bug has been fixed. ``` UPDATE ``` TraversalSideEffect Update ^^^^^^^^^^^^^^^^^^^^^^^^^^ There were significant changes to `TraversalSideEffect` both at the semantic level and at the API level. Users that have traversals of the form `sideEffect {…} ` that use global side-effects should read the following carefully. If the user only uses side-effects via non-lambda steps (e.g. `groupCount("m")`) and with `g.withSideEffects()`, then the changes below will not effect them. Providers will not be effected by the changes save any tests cases that use `Traversal.getSideEffects().get()`. This method no longer returns `Optional<V>`, but instead just the value `<V>`. TraversalSideEffect Get API Change ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ `TraversalSideEffects` can now logically operate within a distributed OLAP environment. In order to make this possible, it is necessary that each side-effect be registered with a reducing `BinaryOperator`. This binary operator will combine distributed updates into a single global side-effect at the master traversal. Many of the methods in `TraversalSideEffect` have been `Deprecated`, but they are backwards compatible save that `TraversalSideEffects.get()` no longer returns an `Optional`, but instead throws an `IllegalArgumentException`. While the `Optional` semantics could have remained, it was deemed best to directly return the side-effect value to reduce object creation costs and because all side-effects must be registered apriori and therefore there is never a reason why an unknown side-effect key would be used. TraversalSideEffect Registration Requirement ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ All `TraversalSideEffects` must be registered upfront. This is because, in OLAP, side-effects map to `Memory` compute keys and as such, must be declared prior to the execution of the `TraversalVertexProgram`. If a user's traversal creates a side-effect mid-traversal, it will fail. The traversal must use `GraphTraversalSource.withSideEffect()` to declare the side-effects it will use during its execution lifetime. TraversalSideEffect Add Requirement ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ In a distributed environment, a side-effect can not be mutated and be expected to exist in the mutated form at the final, aggregated, master traversal. For instance, if a side-effect "myCount" references a `Long`, the `Long` can not be updated directly via `sideEffects.set("myCount", sideEffects.get("myCount") + 1)`. Instead, it must rely on the reducer to do the merging and thus, the `Step` must do `sideEffect.add("mySet",1)` where the registered reducer is `Operator.sum`. Note that `Traverser.sideEffect(key,value)` will use `TraversalSideEffect.add()`. Thus, the below will increment "a". If no operator was provided, then the operator is assumed `Operator.assign` and the final result of "a" would be 1. `g.withSideEffect("a",0,sum).V().out().sideEffect(t -> t.sideEffect("a",1))` See link: https://issues.apache.org/jira/browse/TINKERPOP-1192[TINKERPOP-1192 ] ProfileStep Update and GraphTraversal API Change ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ The `profile()`-step has been refactored into 2 steps – `ProfileStep` and `ProfileSideEffectStep`. Users who previously used the `profile()` in conjunction with `cap(TraversalMetrics.METRICS_KEY)` can now simply omit the cap step. Users who retrieved `TraversalMetrics` from the side-effects after iteration can still do so, but will need to specify a side-effect key when using the `profile()`. For example, `profile("myMetrics")`. See link: https://issues.apache.org/jira/browse/TINKERPOP-958[TINKERPOP-958 ] ``` VOTE +1. You can merge this pull request into a Git repository by running: $ git pull https://github.com/apache/incubator-tinkerpop TINKERPOP-1192 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/incubator-tinkerpop/pull/251.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #251 commit 8e807eec73a72b4bbbde3156e6aa8b237d099b42 Author: rjbriody <bob.briody@datastax.com> Date: 2016-02-24T20:08:25Z TINKERPOP-958 split .profile() into ProfileSideEffectStep and ProfileStep (which is a MapStep). commit 856bb564baad3a28754aa37a77803ab3d2f9b8a5 Author: rjbriody <bob.briody@datastax.com> Date: 2016-02-25T16:07:17Z Condense ProfileSideEffectTest into ProfileTest. commit 7374ee45cd918f0770df3966476e98fac10794ae Author: rjbriody <bob.briody@datastax.com> Date: 2016-02-26T14:52:20Z Fix computer issues for profile step. commit 86941ee5c6ff37f2bc4f88ea520a530e9372ba79 Author: rjbriody <bob.briody@datastax.com> Date: 2016-03-01T13:35:45Z Update traversal docs for Profile step changes. commit 06cdff86c21a9d2c20f528d9c620604dde1b6a07 Author: rjbriody <bob.briody@datastax.com> Date: 2016-03-01T13:39:34Z Update traversal docs for Profile step changes. commit 6f6c9a5698373291c51e9b12af86a5dc3e09ff0e Author: rjbriody <bob.briody@datastax.com> Date: 2016-03-01T13:45:38Z Revert "Update traversal docs for Profile step changes." This reverts commit 06cdff86c21a9d2c20f528d9c620604dde1b6a07. commit ebb6143e9dffbc0980dcb1445a18088f3e7fd4fa Author: rjbriody <bob.briody@datastax.com> Date: 2016-03-01T13:45:46Z Revert "Update traversal docs for Profile step changes." This reverts commit 86941ee5c6ff37f2bc4f88ea520a530e9372ba79. commit 50c56563157337e71f74a940b9f6c17718860dd3 Author: rjbriody <bob.briody@datastax.com> Date: 2016-03-01T13:49:22Z Update traversal docs for Profile step changes. commit 7013a9746cc04da32bb63fe0e6932a468c47ff0d Author: rjbriody <bob.briody@datastax.com> Date: 2016-03-01T13:50:37Z merge master commit 5f6ab602c2539f12c35c3be607f287dd689375af Author: Marko A. Rodriguez <okrammarko@gmail.com> Date: 2016-03-02T21:23:20Z Created MemoryTraversalSideEffects which backs a TraversalSideEffects by GraphComputer.Memory. Everything works except AggregateStep and ProfileStep. Need more thinking. I found a fundamental bug in Step.clone() where SideEffects in the cloned traversal referenced the original traversal. It had to do with the fast the cloning of a step is like this – step.clone(), step.setTraversal(traversal). As such, for all TraversalParent steps, I had to Override setTraversal(traversal) and it is there where TraversalParent.integrateChild(...) is called. However, once that was done, everything with MemoryTraversalSideEffects just worked (except Aggregate and Profile — of course). This is a really sweet model. Removed VertexTraversalSideEffects which was the old model which stored sideEffects as traverser properties which were then processed and aggregated via MapReduce jobs. With MemoryTraversalSideEffects, the sideEffects are contained within the TraversalVertexProgram job — its so slick (and way more efficient). commit b63f62e7008614782bcd3c4ef5a8b4ea7e6eb830 Author: Marko A. Rodriguez <okrammarko@gmail.com> Date: 2016-03-02T23:30:54Z through hell and back. TraversalSideEffects are clean and proper. Added LocalBarrier interface which tells GraphComputer not to aggregate the traversers to the master traversal, but instead to keep them distributed across the cluster, but block – that is, barrier. This is a way to force a sychronization without having to move data. Currently only AggregateStep implements LocalBarrier. However, other steps such as NoOpCollectingBarrierStep and SupplyingBarrierStep can use it. All test cases pass except for the ProfileTest stuff. Going to merge in the @rjbriody branch into this branch and get everything smooth and slick and then PR both from this branch. commit 08780c4b8ceaa4231b4a8421fea064c0f8122f84 Author: Marko A. Rodriguez <okrammarko@gmail.com> Date: 2016-03-03T14:25:19Z Figured out why Groovy traversals were failing. I was overriding the SideEffects in ScriptTraversal. I needed to use the SideEffects from the parent traversal. Commented out the ProfileTests for now. Going to now try and merge @rjbriody work into this branch and then if all is gravy, PR both from here. commit e4ea17395478132e7f78bd53a9921ef803ab6a96 Author: Marko A. Rodriguez <okrammarko@gmail.com> Date: 2016-03-03T14:28:44Z Merge branch ' TINKERPOP-958 ' of https://github.com/rjbriody/incubator-tinkerpop into TINKERPOP-1192 Conflicts: gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/process/traversal/step/sideEffect/ProfileSideEffectStep.java gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/structure/io/gryo/GryoMapper.java gremlin-groovy-test/src/main/java/org/apache/tinkerpop/gremlin/process/GroovyProcessComputerSuite.java gremlin-test/src/main/java/org/apache/tinkerpop/gremlin/process/ProcessComputerSuite.java commit ff7f7d2b74a42d7c9574a0ebbb8cc4a6897195f5 Author: Marko A. Rodriguez <okrammarko@gmail.com> Date: 2016-03-03T17:03:09Z added @rjbriody ProfileStep updates. commit 5ec7dd063087dfd8c7cc1894bf86fa82a80a75ba Author: Marko A. Rodriguez <okrammarko@gmail.com> Date: 2016-03-03T19:11:02Z Finalized the TravarsalSideEffect API work. There is one breaking change that was needed – its small. However, the semantics of TraversalSideEffect now require all side-effects to be registered upfront via withSideEffect(). Users can no longer create a side-effect mid-traversal. This ensures that OLTP and OLAP behave identically. This occurs when people did something like — g.V().sideEffect {it.sideEffect('a',new Set())} .out().out().... that sideEffect will need to be created at g.withSideEffect('a',Set::new).... Likewise. All other methods work but are deprecated and assume Operator.assign if now registered Reducer exists — this will be okay for OLTP, but will lead to bad results in OLAP. In short, peoeople should really register reducers with their sideEffects – g.V.withSideEffect(a,Set::new,Operator.addAll)......Added a pretty insan-o nested, sideEffect usage test to GroupCountTest – OLTP and OLAP are happy. commit ff390381b9896a16c73a056f167e84f2beb8f1d6 Author: Marko A. Rodriguez <okrammarko@gmail.com> Date: 2016-03-03T20:07:52Z added test to make sure declared SideEffect is ultimate returned class. commit f778b5b0ab13c3860c30e39eadd64dcdd7af1c18 Author: Marko A. Rodriguez <okrammarko@gmail.com> Date: 2016-03-03T22:11:40Z Added some crazy tests to SideEffectTest which verify that registered operators and suppliers as well as mid-traversal side-effect calls behave as expected in both OLTP and OLAP. This required lots of tinkering with DefaultTraversalSideEffects to get the logic right around registeration. Removed a pointless TraversalSideEffectsTest we had. Doing all the complex logic in SideEffectTest. commit 731789637d2b47306bed3a2f3008c9bf063a0f3a Author: Marko A. Rodriguez <okrammarko@gmail.com> Date: 2016-03-03T23:58:00Z fixed a bug in DefaultTraversalSideEffects.mergedInto(). Added THE MOST INSANE test case to PageRankTest. SideEffects are preserved across GraphComputer jobs. Insane. commit 02a551a5f544f8f3b2344631252c21c03a51f117 Author: Marko A. Rodriguez <okrammarko@gmail.com> Date: 2016-03-04T02:14:18Z added some nice test cases that ensure cloning and compiling respect TraversalSideEffect propagation. Added test cases that verify that the new TraversalSideEffect API changes are correctly implemented by DefaultTraversalSideEffect. All in all, just more tests cases to give us confidence. commit 1c174ba5e5dfa025b41df541ba21b3637e17901e Author: Marko A. Rodriguez <okrammarko@gmail.com> Date: 2016-03-04T02:31:34Z realized a very simple (and wildly efficient) optimization to TraversalVertexProgram around when to and not to return halted traversers to the master traversal. Okay. Done for the night.
        Hide
        githubbot ASF GitHub Bot added a comment -

        Github user okram commented on the pull request:

        https://github.com/apache/incubator-tinkerpop/pull/251#issuecomment-192260836

        ```
        [INFO] ------------------------------------------------------------------------
        [INFO] Reactor Summary:
        [INFO]
        [INFO] Apache TinkerPop .................................. SUCCESS [4.845s]
        [INFO] Apache TinkerPop :: Gremlin Shaded ................ SUCCESS [2.346s]
        [INFO] Apache TinkerPop :: Gremlin Core .................. SUCCESS [41.846s]
        [INFO] Apache TinkerPop :: Gremlin Test .................. SUCCESS [11.712s]
        [INFO] Apache TinkerPop :: Gremlin Groovy ................ SUCCESS [42.031s]
        [INFO] Apache TinkerPop :: Gremlin Groovy Test ........... SUCCESS [6.604s]
        [INFO] Apache TinkerPop :: TinkerGraph Gremlin ........... SUCCESS [4:01.081s]
        [INFO] Apache TinkerPop :: Hadoop Gremlin ................ SUCCESS [5:18.703s]
        [INFO] Apache TinkerPop :: Spark Gremlin ................. SUCCESS [7:54.731s]
        [INFO] Apache TinkerPop :: Giraph Gremlin ................ SUCCESS [2:22:15.673s]
        [INFO] Apache TinkerPop :: Neo4j Gremlin ................. SUCCESS [18:03.868s]
        [INFO] Apache TinkerPop :: Gremlin Driver ................ SUCCESS [9.911s]
        [INFO] Apache TinkerPop :: Gremlin Server ................ SUCCESS [11:23.461s]
        [INFO] Apache TinkerPop :: Gremlin Console ............... SUCCESS [1:10.067s]
        [INFO] Apache TinkerPop :: Gremlin Archetype ............. SUCCESS [0.082s]
        [INFO] Apache TinkerPop :: Archetype - TinkerGraph ....... SUCCESS [6.484s]
        [INFO] Apache TinkerPop :: Archetype - Server ............ SUCCESS [10.505s]
        [INFO] ------------------------------------------------------------------------
        [INFO] BUILD SUCCESS
        [INFO] ------------------------------------------------------------------------
        [INFO] Total time: 3:12:24.450s
        [INFO] Finished at: Thu Mar 03 22:44:08 MST 2016
        [INFO] Final Memory: 99M/813M
        [INFO] ------------------------------------------------------------------------
        ~/software/tinkerpop/tinkerpop3$
        ```

        Show
        githubbot ASF GitHub Bot added a comment - Github user okram commented on the pull request: https://github.com/apache/incubator-tinkerpop/pull/251#issuecomment-192260836 ``` [INFO] ------------------------------------------------------------------------ [INFO] Reactor Summary: [INFO] [INFO] Apache TinkerPop .................................. SUCCESS [4.845s] [INFO] Apache TinkerPop :: Gremlin Shaded ................ SUCCESS [2.346s] [INFO] Apache TinkerPop :: Gremlin Core .................. SUCCESS [41.846s] [INFO] Apache TinkerPop :: Gremlin Test .................. SUCCESS [11.712s] [INFO] Apache TinkerPop :: Gremlin Groovy ................ SUCCESS [42.031s] [INFO] Apache TinkerPop :: Gremlin Groovy Test ........... SUCCESS [6.604s] [INFO] Apache TinkerPop :: TinkerGraph Gremlin ........... SUCCESS [4:01.081s] [INFO] Apache TinkerPop :: Hadoop Gremlin ................ SUCCESS [5:18.703s] [INFO] Apache TinkerPop :: Spark Gremlin ................. SUCCESS [7:54.731s] [INFO] Apache TinkerPop :: Giraph Gremlin ................ SUCCESS [2:22:15.673s] [INFO] Apache TinkerPop :: Neo4j Gremlin ................. SUCCESS [18:03.868s] [INFO] Apache TinkerPop :: Gremlin Driver ................ SUCCESS [9.911s] [INFO] Apache TinkerPop :: Gremlin Server ................ SUCCESS [11:23.461s] [INFO] Apache TinkerPop :: Gremlin Console ............... SUCCESS [1:10.067s] [INFO] Apache TinkerPop :: Gremlin Archetype ............. SUCCESS [0.082s] [INFO] Apache TinkerPop :: Archetype - TinkerGraph ....... SUCCESS [6.484s] [INFO] Apache TinkerPop :: Archetype - Server ............ SUCCESS [10.505s] [INFO] ------------------------------------------------------------------------ [INFO] BUILD SUCCESS [INFO] ------------------------------------------------------------------------ [INFO] Total time: 3:12:24.450s [INFO] Finished at: Thu Mar 03 22:44:08 MST 2016 [INFO] Final Memory: 99M/813M [INFO] ------------------------------------------------------------------------ ~/software/tinkerpop/tinkerpop3$ ```
        Hide
        githubbot ASF GitHub Bot added a comment -

        Github user dkuppitz commented on the pull request:

        https://github.com/apache/incubator-tinkerpop/pull/251#issuecomment-192268671

        Docs don't build properly (error in `the-traversal.asciidoc`).

        Show
        githubbot ASF GitHub Bot added a comment - Github user dkuppitz commented on the pull request: https://github.com/apache/incubator-tinkerpop/pull/251#issuecomment-192268671 Docs don't build properly (error in `the-traversal.asciidoc`).
        Hide
        githubbot ASF GitHub Bot added a comment -

        Github user okram commented on the pull request:

        https://github.com/apache/incubator-tinkerpop/pull/251#issuecomment-192307278

        Bug in docs fixed and I also did some more work making our docs (OLAP stuff) better.

        Published:
        http://tinkerpop.apache.org/docs/3.2.0-SNAPSHOT/reference/

        Show
        githubbot ASF GitHub Bot added a comment - Github user okram commented on the pull request: https://github.com/apache/incubator-tinkerpop/pull/251#issuecomment-192307278 Bug in docs fixed and I also did some more work making our docs (OLAP stuff) better. Published: http://tinkerpop.apache.org/docs/3.2.0-SNAPSHOT/reference/
        Hide
        githubbot ASF GitHub Bot added a comment -

        Github user dkuppitz commented on the pull request:

        https://github.com/apache/incubator-tinkerpop/pull/251#issuecomment-192343796

        Worked!

        VOTE: +1

        Show
        githubbot ASF GitHub Bot added a comment - Github user dkuppitz commented on the pull request: https://github.com/apache/incubator-tinkerpop/pull/251#issuecomment-192343796 Worked! VOTE: +1
        Hide
        githubbot ASF GitHub Bot added a comment -

        Github user spmallette commented on the pull request:

        https://github.com/apache/incubator-tinkerpop/pull/251#issuecomment-192346032

        good with `mvn clean install`

        VOTE +1

        Show
        githubbot ASF GitHub Bot added a comment - Github user spmallette commented on the pull request: https://github.com/apache/incubator-tinkerpop/pull/251#issuecomment-192346032 good with `mvn clean install` VOTE +1
        Hide
        githubbot ASF GitHub Bot added a comment -

        Github user asfgit closed the pull request at:

        https://github.com/apache/incubator-tinkerpop/pull/251

        Show
        githubbot ASF GitHub Bot added a comment - Github user asfgit closed the pull request at: https://github.com/apache/incubator-tinkerpop/pull/251

          People

          • Assignee:
            okram Marko A. Rodriguez
            Reporter:
            okram Marko A. Rodriguez
          • Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development