Details
Description
The Memory object is not being written to disk in SparkGraphComputer unless its being updated within a MapReduce job. That is no bueno. We should really have the computed Memory be written as such:
hdfs.ls("output")
==>~g
==>~memory
Moreover, ~g should be ~graph but that is a different story...
Then:
hdfs.ls("output/~memory")
==>gremlin.traversalVertexProgram.haltedTraversals
==>a
==>x
Note that every GraphComputer job yields a ComputerResult which is basically Pair<Graph,Memory>. The Graph reference denotes the adjacency list of vertices and on all those vertices, if there are HALTED_TRAVERSERS, they will be on those vertices. This is a distributed representation. Next, the Memory reference denotes data that is no longer "attached to the graph" – like maps, counts, sums, etc. In general, reduction barriers. This data is not tied to any one vertex anymore an thus exists at the "master traversal" via Memory. Thus, "graph is distributed/workers" and "memory is local/master." We need to make sure that the Memory data is serialized to disk appropriately for HadoopGraph-based implementations...
Attachments
Issue Links
- relates to
-
TINKERPOP-1298 Save OLAP results to file
- Open