Details
-
Improvement
-
Status: Closed
-
Major
-
Resolution: Fixed
-
3.1.0-incubating
-
None
Description
This works, but its crazy to do for large data over non-random access sources.
// g is a SparkGraphComputer traversal
gremlin> g.V().out().out()
==>v[3]
==>v[5]
gremlin>
Why is this crazy? Cause for each vertex, there is a graph.vertices(id) lookup which, for HadoopGraph is a linear scan of the input format. This is nutz for massive graphs.
gremlin> g.V().out().out().toList().get(0).getClass()
==>class org.apache.tinkerpop.gremlin.hadoop.structure.HadoopVertex
In our docs, we should state that you should use HadoopGraph to generate reductions and not just swathes of vertices. Or, if you need a vertex, don't get the vertex, get ONLY its ID.
gremlin> g.V().out().out().id() ==>3 ==>5
Finally, note that in TraversalVertexProgram we have a configuration that we never exposed to the user but we should via gremlin.traversalVertexProgram.attachElements.
As we have it now attachElements is always TRUE.