Uploaded image for project: 'TinkerPop'
  1. TinkerPop
  2. TINKERPOP-1074

More contractual testing/specifications around Persist and ResultGraph.

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Open
    • Major
    • Resolution: Unresolved
    • 3.1.0-incubating
    • None
    • process
    • None

    Description

      A ComputerResult references two objects: a graph and a memory. The graph is the resultant computed graph and the memory contains all the sideEffect data from the computation (if any).

      Right now, we have the following Persist options: NOTHING, VERTEX_PROPERTIES, EDGES. We also have the following ResultGraph options: ORIGINAL, NEW.

      • NOTHING + ORIGINAL = ComputerResult contains original graph reference.
      • NOTHING + NEW = ?? No test to force what this means! Should be EmptyGraph.instance().
      • VERTEX_PROPERTIES + ORIGINAL = ComputerResult contains original graph, but the computed vertex properties have been "saved" to it. (no contractual test cases here either!)
      • VERTEX_PROPERTIES + NEW = ComputerResult contains new graph with only vertices and their properties.
      • EDGES + NEW = ComputerResult contains new graph with vertices, edges, and their properties.
      • EDGES + ORIGINAL = ComputerResult contains original graph, but the computed vertex properties and edges have been "saved" to it. (no contractual test cases here either!)

      TinkerGraphComputer is the only system that supports all the above configuration combinations. Add test cases to GraphComputerTest that verify the behavior of all combinations.

      HOWEVER !!!! ------ should we really respect ORIGINAL+PERSIST? Most providers will use BulkLoaderVertexProgram to write the computed graph back to the original graph. If there are TWO ways of doing this, this seems bad? In fact, the way that TinkerGraphComputer writes the computed graph back to the original graph is nearly identical to how it BulkLoaderVertexProgram works. Thus, I'm wondering if we simply get rid the concept of ResultGraph and ONLY have Persist.

      • Persist.NOTHING: Returns the original graph in ComputerResult.
      • Persist.VERTEX_PROPERTIES: Returns a new graph with only vertices and properties.
      • Persist.EDGES: Returns a new graph with vertices, edges, and their properties.

      For in-memory graphs like TinkerGraph, "new graph" can mean the original graph with the GraphView overlay. Thus, its not really a full copy of the original graph. Moreover, Persist.NOTHING just garbage collects the GraphView and thus, the original graph.

      ------------------

      Next, what does Persist mean for memory? Remember, ComputerResult also has a reference to sideEffect memory. What if you want to run a job, NOT persist the graph, but persist the memory only. I think we should ALWAYS assume memory persistence. For TinkerGraph, that means the the ComputerResult.memory() has a HashMap of memory values. For Giraph/Spark, that means that the Storage will always have resultant sideEffect data in the output directory even if there is no graph.

      • NOTHING: persist memory and return the original graph.
      • VERTEX_PROPERTIES: persist memory and return new graph of just vertex properties.
      • EDGES: persist memory and return new graph of vertex properties, and edges.

      Decisions, decisions, decisions....

      Attachments

        Activity

          People

            Unassigned Unassigned
            okram Marko A. Rodriguez
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

              Created:
              Updated: