Details
-
Improvement
-
Status: Closed
-
Major
-
Resolution: Fixed
-
3.0.1-incubating
Description
Right now GroupStep is defined as:
public final class GroupStep<S, K, V, R> extends ReducingBarrierStep<S, Map<K, R>> implements MapReducer, TraversalParent { private Traversal.Admin<S, K> keyTraversal = null; private Traversal.Admin<S, V> valueTraversal = null; private Traversal.Admin<Collection<V>, R> reduceTraversal = null; ...
Look at reduceTraversal. It takes a Collection<V> of "values" and reduces them to a "reduction" R. Why are we using Collection<V>, why is this not:
private Traversal.Admin<V, R> reduceTraversal = null;
Now, when a new K is created (and reduce is defined), we clone reduceTraversal. Thus, each key has a reduceTraversal (identical clones) that operate in a stream like fashion on V to yield R. This enables us to remove the Collection<V> (memory hog) and allows us to defined GroupCountStep in terms of GroupStep without (?limited?) computational cost. HOWEVER, this changes the API as people who did this:
g.V.group.by(label()).by(outE().count()).by(sum(local))
would now have to do this:
g.V.group.by(label()).by(outE().count()).by(sum())
Its very minor, given the speed up we would gain and the ability for us to now do "groupCount" efficiently on arbitrary values – not just bulks (e.g. sacks).