Uploaded image for project: 'TinkerPop'
  1. TinkerPop
  2. TINKERPOP-1192

TraversalSideEffects should support registered reducers (binary operators).

VotersWatch issueWatchersLinkCloneUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Closed
    • Major
    • Resolution: Implemented
    • 3.1.1-incubating
    • 3.2.0-incubating
    • process
    • None

    Description

      TinkerPop 3.2.0-SNAPSHOT has made leaps and bounds towards completely aligning Gremlin OLTP and Gremlin OLAP. What has got me really excited is that there is such a strong conceptual alignment between the following components:

      	VertexProgram <=> Traversal
      	Iteration <=> Step
      	Messages <=> Traversers
      	MessageCombiner <=> TraverserSet ("bulking")
      	BSP <=> Barrier
      	Workers <=> Parallel Steps
      	Master <=> Sequential Steps
      	Memory <=> SideEffects
      

      TraversalVertexProgram is very clean – its lays atop the GraphComputer API in a natural, effortless way.

      However, there is one last pairing that needs some better alignment: GraphComputer Memory and Traversal SideEffects. A Memory slot has the notion of a key, a value, and a reducer (binary operator). A Traversal SideEffect as the notion of a key and a value. I think we should enable Traversal SideEffects to support registered reducers. If we do this, then there is perfect alignment between the two models and we won't have to have "if(onGraphComputer)"-type logic in our side-effect steps.

      Right now in GroupCountSideEffectStep we do this:

      public void sideEffect(final Traverser<S> traverser) {
        Map<E,Long> groupCountMap = this.getTraversal().getSideEffects().get(this.sideEffectKey);
        MapHelper.incr(traverser.get(), traverser.bulk(), groupCountMap)
      }
      

      We are explicitly getting the Map from the sideEffects and updating it. This model will not generally work in OLAP because groupCountMap is a distributed data structure and thus, local updates to a Map don't distribute. I have it working currently in master/, but at the cost of not being able to read the sideEffect, only write to it. To make TraversalSideEffects consistent across both OLTP and OLAP, I think we should express GroupCountSideEffectStep like this (*** analogously for GroupSideEffectStep, TreeSideEffectStep, etc.):

      public void sideEffect(final Traverser<S> traverser) {
        this.getTraversal().getSideEffects().add(this.sideEffectKey, Collections.singletonMap(traverser.get(), traverser.bulk());
      }
      

      Moreover, TraversalSideEffects should have the following method:

      	TraversalSideEffects.register(final String key, final Supplier<A> initialValue, final BinaryOperator<A> reducer)
      

      Note that we already have:

      https://github.com/apache/incubator-tinkerpop/blob/master/gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/process/traversal/TraversalSideEffects.java#L88

      We can deprecate the current registerSupplier() in support of register(). Moreover, for backwards compatibility, BinaryOperator<A> reducer would simply be Operator.assign.
      https://github.com/apache/incubator-tinkerpop/blob/master/gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/process/traversal/Operator.java#L59-L62
      Thus, this would not be a breaking change and it will ensure a natural congruence between these two related computing structures – Memory and TraversalSideEffects.

      Attachments

        Activity

          This comment will be Viewable by All Users Viewable by All Users
          Cancel

          People

            okram Marko A. Rodriguez
            okram Marko A. Rodriguez
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Slack

                Issue deployment