Details
-
Improvement
-
Status: Done
-
Major
-
Resolution: Done
-
0.2.1
-
None
-
None
Description
Background
S2Graph has been used HBase as default storage layer from the beginning, but supporting more storage backend other than HBase has been discussed.
Support for various storag backend is important, since users with different storage backend setup can try S2Graph as graph layer easily.
Problem Statement
To add new storage backend into S2Graph, one have to extends Storage class. Current implementation on Storage class has following logical components.
- Mutate: given KeyValue which is physical representation for Edge/Vertex, responsible to actually store them into physical storage backend using storage backend specific client.
- Fetch: given QueryRequest/Vertex/Edge, fetch KeyValue from storage backend, then translate them into Edge/Vertex.
- Write-Write Conflict Resolve: with storage backend that provide optimistic concurrency control, resolving write-write conflict on same SnapshotEdge with EdgeId to keep index edge consistent.
- Serialize/Deserializer: given common representation for KeyValue, SKeyValue, translate them to/from actual physical storage backend’s KeyValue. Mutate require Serialize to convert SKeyValue to storage layer’s specific KeyValue, and Fetch require Deserialize to convert from storage backend specific KeyValue to common SKeyValue
- Management: management storage backend’s resources such as client connection and tables.
Current implementation does all above in one single class Storage, and it makes adding new storage backend hard since it is not clear what need to be override for new storage backend.
Users who want to add new storage backend implementation need to go through lots of method which is not neccessery to be overrided, and it is also hard to test/debug new implementation.
Related Jiras
S2GRAPH-1 Add Redis storage(Key/Value storage) engine for S2Graph
S2GRAPH-166 Provide embeddable storage baackend using RocksDB.
Suggestion
Explicitly separate functionalities on Storage class and make each functionalities testable.
Overrall interface suggestion for Storage which orchestrate above functionalities.
abstract class Storage[Q](val graph: S2Graph, val config: Config) { /* Storage backend specific resource management */ val management: StorageManagement /* Physically store given KeyValue into backend storage. */ val mutator: StorageWritable /* * Given QueryRequest/Vertex/Edge, fetch KeyValue from storage * then convert them into Edge/Vertex */ val fetcher: StorageReadable[Q] /* * Serialize Edge/Vertex, to common KeyValue, SKeyValue that * can be stored aligned to backend storage's physical schema. * Also Deserialize storage backend's KeyValue to SKeyValue. */ val serDe: StorageSerDe /* * Common helper to translate SKeyValue to Edge/Vertex and vice versa. * Note that it require storage backend specific implementation for serialize/deserialize. */ lazy val io: StorageIO = new StorageIO(graph, serDe) /* * Common helper to resolve write-write conflict on snapshot edge with same EdgeId. * Note that it require storage backend specific implementations for * all of StorageWritable, StorageReadable, StorageSerDe, StorageIO */ lazy val conflictResolver: WriteWriteConflictResolver = new WriteWriteConflictResolver(graph, serDe, io, mutator, fetcher) }
By explicitly separate functionalities, each storage backend implementation explicitly knows what they need to implement. Also test can be easy. For example, one only need to initialize new implementation of StorageSerDe, not entire Storage to test if it’s serialize/deserialize works correctly.
Attachments
Issue Links
- links to