[TINKERPOP-1564] Distributed OLTP Traversals and the Introduction of Partition Concept - ASF JIRA

Details

Type: Improvement
Status: Closed
Priority: Major
Resolution: Later
Affects Version/s: 3.2.3
Fix Version/s: None
Component/s: driver, process, server
Labels:
None

Description

This proposal unifies OLTP and OLAP into a single framework that removes the need for OLAP GraphComputer by introducing distributed, data local processing to OLTP. In essence, this is a proposal for a step-by-step query routing framework within Traversal. This proposal can work across machines in a cluster, threads on a machine, or in a hierarchical fashion machines&threads. The example presented will discuss distribution across machines in a cluster as its the most complicated scenario.

Currently, an OLTP traversal executes at a particular machine (or thread) and pulls vertex/edge/etc. data to it accordingly in order to solve the traversal. In OLAP, the traversal is cloned and distributed to all machines in the cluster and traversals communicate with one another by sending Traversers (i.e. messages) between themselves ensuring data local processing. Given recent advancements in GremlinServer and RemoteTraversal, it is possible to add traverser routing to OLTP and thus, effect the computational paradigm of Gremlin OLAP in Gremlin OLTP with some added benefits not possible in Gremlin OLAP.

Assume a 4 machine cluster and the following traversal:

g.V(1).out(‘knows’).has(‘age’,gt(20)).out(‘likes’).values(‘name’)

Every time there is a "walk" (adjacency), it is possible that the Traverser is no longer accessing data local to the current machine. In order to do data local query routing, every adjacency would feed into a PartitionStep. The traversal above would be cloned (via Bytecode distribution) across the cluster where "sibling" PartitionSteps would have network access to one another using the same protocol of RemoteConnection though called PartitionConnection. Thus, given the 4 node cluster example, the above traversal would be overlaid as below. Note that partition() would not be a new step in the language, but simply provided here to show where PartitionStrategy would insert PartitionSteps into the traversal.

g.V(1).out(‘knows’).partition().has(‘age’,gt(20)).out(‘likes’).partition().values(‘name’).partition()
                       |                                           |                         ^
    __.out(‘knows’).partition().has(‘age’,gt(20)).out(‘likes’).partition().values(‘name’).partition()
                       |                                           |                         |
    __.out(‘knows’).partition().has(‘age’,gt(20)).out(‘likes’).partition().values(‘name’).partition()
                       |                                           |                         |
    __.out(‘knows’).partition().has(‘age’,gt(20)).out(‘likes’).partition().values(‘name’).partition()

The top traversal is called the "master traversal" and the other three "worker traversals." Note that this is identical to current Gremlin OLAP. Now, the master traversal would be the traversal that is .next()'d for results. So, when the "master traversal" is next()'d, g.V(1) will fetch v[1] and then its outgoing knows-adjacencies. These adjacent "reference vertices" would be fed into the first remote() and a "routing algorithm" would determine where in the cluster the particular vertex's data is. Thus, partition() (PartitionStep) serves as a router, pushing Traversers local to the data. Finally, note that the final PartitionSteps can only feed back to the "master traversal" for ultimate aggregation and return to the user.

TinkerPop currently has all the structures in place to make this possible:

1. Encapsulation of computational metadata via Traverser.
2. The ability to detach Traversers and migrate/serialize them via Traverser.detach() and Traverser.attach().
3. The concept of ReferenceElement so the traverser only carries with it enough information to re-attach at the remote site.
4. Bytecode and the ability to send Traversals across the cluster.
5. GremlinServer and Client/Cluster messaging protocol.

What does PartitionStep look like? Please see comments below

Here are the benefits of this model:

Gremlin OLTP is Gremlin OLAP. The semantics of Gremlin OLAP are exactly what is proposed here but with the added benefit that message passing happens at the partition/subgraph level, not the star vertex level.
There is no need for SparkGraphComputer as GremlinServer now plays the role of SparkServer. The added benefit, no pulling data from the graph database and re-representing it in an RDD or SequenceFile.
No longer are "local children traversals" the boundary for "OLAP." Local children can be processed beyond the star graph, but would require pulling data from a remote machine is necessary. However, given a good graph partitioning algorithm, local children will most likely NOT leave the subgraph partition and thus, will remain a local computation.
Failover is already built into the architecture. If a PartitionStep can not be accessed, but the machine's data is still available (perhaps via replication), then data will simply be pulled over the wire instead of traversers routed to the "dead node."
The infrastructure for side-effects and reducing barrier steps already implemented for Gremlin OLAP would automatically work for distributed Gremlin OLTP.
If the entire graph is hot in-memory across the cluster, then distributed in-memory graph computing is possible. Again, no more linear-scans over partitions like with Giraph/Spark/etc. (GraphComputer).
If transactions are worked out, then distributed OLTP Gremlin provides mutation capabilities (something currently not implemented for GraphComputer). That is addV, addE, drop, etc. just works. *Caveate, transactions in this environment across GremlinServer seems difficult.*

So thats that. This could very well be the future of Gremlin OLAP. The disjoint between OLAP and OLTP would go away, the codebase would be simplified, and the computational gains in terms of performance and expressivity would be great. This is a big deal idea.

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending

distributed-oltp.png
06/Dec/16 15:09
139 kB
Marko A. Rodriguez

Issue Links

is related to

TINKERPOP-1208 Support for MessageScope.Subgraph in GraphComputer

Closed

relates to

TINKERPOP-1250 Support Subgraph-Centric GraphComputer

Closed

TINKERPOP-1344 ReferenceYYYXXX for better message passing

Closed

TINKERPOP-1346 Gryo 4.0

Closed

Distributed OLTP Traversals and the Introduction of Partition Concept

Details

Description

Attachments

Attachments

Issue Links

Activity

People

Dates