Uploaded image for project: 'TinkerPop'
  1. TinkerPop
  2. TINKERPOP-1309

Memory output in HadoopGraph is too strongly tied to MapReduce and should be generalized.

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Open
    • Major
    • Resolution: Unresolved
    • 3.2.0-incubating
    • None
    • hadoop, process

    Description

      The Memory object is not being written to disk in SparkGraphComputer unless its being updated within a MapReduce job. That is no bueno. We should really have the computed Memory be written as such:

      hdfs.ls("output")
      ==>~g
      ==>~memory
      

      Moreover, ~g should be ~graph but that is a different story...

      Then:

      hdfs.ls("output/~memory")
      ==>gremlin.traversalVertexProgram.haltedTraversals
      ==>a
      ==>x
      

      Note that every GraphComputer job yields a ComputerResult which is basically Pair<Graph,Memory>. The Graph reference denotes the adjacency list of vertices and on all those vertices, if there are HALTED_TRAVERSERS, they will be on those vertices. This is a distributed representation. Next, the Memory reference denotes data that is no longer "attached to the graph" – like maps, counts, sums, etc. In general, reduction barriers. This data is not tied to any one vertex anymore an thus exists at the "master traversal" via Memory. Thus, "graph is distributed/workers" and "memory is local/master." We need to make sure that the Memory data is serialized to disk appropriately for HadoopGraph-based implementations...

      Attachments

        Issue Links

          Activity

            People

              Unassigned Unassigned
              okram Marko A. Rodriguez
              Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

                Created:
                Updated: