Uploaded image for project: 'TinkerPop'
  1. TinkerPop
  2. TINKERPOP-570

[Proposal] Provide support for OLAP to OLTP to OLAP to OLTP

    Details

    • Type: Improvement
    • Status: Closed
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: 3.0.2-incubating
    • Fix Version/s: 3.2.0-incubating
    • Component/s: process
    • Labels:
      None

      Description

      I'm trying to figure out how we can, within a "single traversal", move between OLAP and OLTP at different sections of the traversal. E.g.

      [g.V.out.has('age',lt,25)]OLAP[out('parent').out('workPlace')]OLTP[out('coworkers').age.groupCount]OLAP
      

      Going from OLAP to OLTP is easy. We have solved that already as OLAP queries return a Traversal<S,E> and thus, can be further processed in OLTP. But what about going from OLTP back into OLAP? We need to be able to stream the OLTP results back into traversers on the vertices of the graph – TinkerGraph (easy), Hadoop (dynamic editing of the disk format!? crazy) .. is there a general pattern that works for all graphs? Finally, what about when the objects are NOT vertices/edges/etc. See the next issue.

      Matthias Broecheler

        Activity

        Hide
        okram Marko A. Rodriguez added a comment -

        This is a broad ticket that is solved in various smaller tickets.

        Show
        okram Marko A. Rodriguez added a comment - This is a broad ticket that is solved in various smaller tickets.
        Hide
        okram Marko A. Rodriguez added a comment -

        Given https://issues.apache.org/jira/browse/TINKERPOP-971, the solution to this ticket will fall out as a natural consequence. For example:

        g.V().out().order().by('age',decr).limit(10).out('knows').values('name')
        

        Right now, the query above will throw a ComputerVerificationException saying something along the lines of "mid-traversal barriers are not allowed in OLAP." That mid-traversal barrier is order().by(). However, with the concepts in TINKERPOP-971, the above traversal would compile for GraphComputer execution as:

        [TraversalVertexProgramStep(GraphStep,VerticesStep,OrderStep),RangeStep,VerticesStep,PropertiesStep]
        

        That is, it went OLAP-to-OLTP. How about OLTP-OLAP? Assume the following traversal:

        g.V().has("name","marko").out("knows").has('age',gt(30)).repeat(out()).times(10).limit(10).values('name')
        

        Given a ReasoningStrategy (another ticket), this could compile to:

        [GraphStep,HasStep,VerticesStep,HasStep,XXX,TraversalVertexProgramStep([RepeatStep([VerticesStep])]),ComputerResultStep,RangeStep,PropertiesStep]
        

        The problem was haven't solved is how do we feed traversers into TraversalVertexProgram (the XXX above)? Well since, TraversalVertexProgram is typed as <ComputerResult,ComputerResult>, we would need do something like this:

        XXX {
          return new DefaultComputerResult {
            graph() { return this.getTraversal().getGraph() }
            memory() { return Map{{HALTED_TRAVERSERS, this.traversal.toSet()}} }
          }
        }
        

        Now when TraversalVertexProgramStep gets next'd() for the first time, it has these options:

        • It is the first step in the traversal and thus, simply calls its TraversalVertexProgram and returns ComputerResult (currently how things work).
        • It pulls a ComputerResult from the previous step and then calls the TraversalVertexProgram on the computerResult.graph().
        • It pulls a ComputerResult that has a graph and a memory with HALTED_TRAVERSERS. It calls the TraversalVertexProgram (but looks into the memory to see if the current vertex has a HALTED_TRAVERSER).

        ....something along those lines. I don't like introducing a new step, and perhaps we can have it where TraversalVertexProgramStep is typed as TraversalVertexProgramStep<Object,ComputerResult> and thus, if the incoming object is a ComputerResult, do one thing, else aggregate all the objects and those are the starts for OLAP....

        Needs more thinking, but I think we are on the right track.

        Show
        okram Marko A. Rodriguez added a comment - Given https://issues.apache.org/jira/browse/TINKERPOP-971 , the solution to this ticket will fall out as a natural consequence. For example: g.V().out().order().by('age',decr).limit(10).out('knows').values('name') Right now, the query above will throw a ComputerVerificationException saying something along the lines of "mid-traversal barriers are not allowed in OLAP." That mid-traversal barrier is order().by() . However, with the concepts in TINKERPOP-971 , the above traversal would compile for GraphComputer execution as: [TraversalVertexProgramStep(GraphStep,VerticesStep,OrderStep),RangeStep,VerticesStep,PropertiesStep] That is, it went OLAP-to-OLTP. How about OLTP-OLAP? Assume the following traversal: g.V().has( "name" , "marko" ).out( "knows" ).has('age',gt(30)).repeat(out()).times(10).limit(10).values('name') Given a ReasoningStrategy (another ticket), this could compile to: [GraphStep,HasStep,VerticesStep,HasStep,XXX,TraversalVertexProgramStep([RepeatStep([VerticesStep])]),ComputerResultStep,RangeStep,PropertiesStep] The problem was haven't solved is how do we feed traversers into TraversalVertexProgram (the XXX above)? Well since, TraversalVertexProgram is typed as <ComputerResult,ComputerResult> , we would need do something like this: XXX { return new DefaultComputerResult { graph() { return this .getTraversal().getGraph() } memory() { return Map{{HALTED_TRAVERSERS, this .traversal.toSet()}} } } } Now when TraversalVertexProgramStep gets next'd() for the first time, it has these options: It is the first step in the traversal and thus, simply calls its TraversalVertexProgram and returns ComputerResult (currently how things work). It pulls a ComputerResult from the previous step and then calls the TraversalVertexProgram on the computerResult.graph(). It pulls a ComputerResult that has a graph and a memory with HALTED_TRAVERSERS. It calls the TraversalVertexProgram (but looks into the memory to see if the current vertex has a HALTED_TRAVERSER). ....something along those lines. I don't like introducing a new step, and perhaps we can have it where TraversalVertexProgramStep is typed as TraversalVertexProgramStep<Object,ComputerResult> and thus, if the incoming object is a ComputerResult, do one thing, else aggregate all the objects and those are the starts for OLAP.... Needs more thinking, but I think we are on the right track.
        Hide
        mbroecheler Matthias Broecheler added a comment -

        I think that going from OLAP to OLTP is the most common use case, as in "rank all vertices by some centrality metric and then run a local traversal for the top 10". The first part is OLAP and the second OLTP.
        I can also see how you might go from OLTP to OLAP if a traversal has a high branching factor or is very deep and then you want to aggregate on that. However, in those cases I don't think you loose much performance by executing the whole thing as OLAP - it's definitely less efficient but not really a big deal imho.

        The OLAP to OLTP use case seems like a more useful transition if the OLAP traversal produces a small set of elements on which further local analysis is needed.

        How is OLAP to OLTP already solved currently? Is that something a graph strategy could automatically decide (ie how to split up the traversal)?

        Show
        mbroecheler Matthias Broecheler added a comment - I think that going from OLAP to OLTP is the most common use case, as in "rank all vertices by some centrality metric and then run a local traversal for the top 10". The first part is OLAP and the second OLTP. I can also see how you might go from OLTP to OLAP if a traversal has a high branching factor or is very deep and then you want to aggregate on that. However, in those cases I don't think you loose much performance by executing the whole thing as OLAP - it's definitely less efficient but not really a big deal imho. The OLAP to OLTP use case seems like a more useful transition if the OLAP traversal produces a small set of elements on which further local analysis is needed. How is OLAP to OLTP already solved currently? Is that something a graph strategy could automatically decide (ie how to split up the traversal)?

          People

          • Assignee:
            okram Marko A. Rodriguez
            Reporter:
            okram Marko A. Rodriguez
          • Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development