Uploaded image for project: 'TinkerPop'
  1. TinkerPop
  2. TINKERPOP-866

GroupStep and Traversal-Based Reductions

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Closed
    • Major
    • Resolution: Fixed
    • 3.0.1-incubating
    • 3.1.0-incubating
    • process

    Description

      Right now GroupStep is defined as:

      public final class GroupStep<S, K, V, R> extends ReducingBarrierStep<S, Map<K, R>> implements MapReducer, TraversalParent {
          private Traversal.Admin<S, K> keyTraversal = null;
          private Traversal.Admin<S, V> valueTraversal = null;
          private Traversal.Admin<Collection<V>, R> reduceTraversal = null;
      ...
      

      Look at reduceTraversal. It takes a Collection<V> of "values" and reduces them to a "reduction" R. Why are we using Collection<V>, why is this not:

      private Traversal.Admin<V, R> reduceTraversal = null;
      

      Now, when a new K is created (and reduce is defined), we clone reduceTraversal. Thus, each key has a reduceTraversal (identical clones) that operate in a stream like fashion on V to yield R. This enables us to remove the Collection<V> (memory hog) and allows us to defined GroupCountStep in terms of GroupStep without (?limited?) computational cost. HOWEVER, this changes the API as people who did this:

      g.V.group.by(label()).by(outE().count()).by(sum(local))
      

      would now have to do this:

      g.V.group.by(label()).by(outE().count()).by(sum())
      

      Its very minor, given the speed up we would gain and the ability for us to now do "groupCount" efficiently on arbitrary values – not just bulks (e.g. sacks).

      Attachments

        Activity

          People

            okram Marko A. Rodriguez
            okram Marko A. Rodriguez
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: