[TINKERPOP-1343] A more efficient StarGraph serialization representation. - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Improvement
Status: Closed
Priority: Major
Resolution: Duplicate
Affects Version/s: 3.2.0-incubating
Fix Version/s: None
Component/s: process
Labels:
- breaking

Description

StarGraph is used by the Hadoop GraphComputers and represents a vertex, its properties, its incident edges, and their properties. In essence, one "row of an adjacency list."

Here are some ideas on how to make the next version of the serialization format more efficient.

1. For all Element ids, we currently use kryo.readClassAndObject(...). This is bad because we have to write the class with each id. It would be better if the StarGraph had metadata like vertexIdClass, vertexPropertyIdClass, and edgeIdClass. Now for every vertex we are serializing three class, but the benefit is that every id class is now known and we can use kryo.readObject(..., xxxIdClass).

2. Edges and VertexProperties are written out as [ edgeLabel[ edge[ id, otherVertexId]*]* and [ propertyKey[ vertexProperty[ id,propertyValue]*]*, respectively. This ensures we don't write so many strings as all edges/vertex properties are grouped by label. However, we do NOT do this for edge properties nor vertex property properties. We simply write out the Map<Object,Map<String,Object>> which is Map<EdgeId,Map<PropertyKey,PropertyValue>>. Since we have to choose between grouping by edgeId or by propertyKey, we should keep it as it is, but create a "meta map" that allows us to represent all property keys in a, e.g., int space. Thus, Map<EdgeId,Map<PropertyKeyIntegerId,PropertyValue>> where we also have a Map<PropertyKeyIntegerId,String> that is serialized with the StarGraph.

There are a few other tickets around optimizing StarGraph here:

https://issues.apache.org/jira/browse/TINKERPOP-1128 (making GraphFilters more efficient)

https://issues.apache.org/jira/browse/TINKERPOP-1122 (pointless bits and StarGraph should never auto-generate IDs as the ID space is distributed).

https://issues.apache.org/jira/browse/TINKERPOP-1287 (related to heap usage and clock cycles – not serialization).

Attachments

Issue Links

relates to

TINKERPOP-1122 StarGraph has a Long nextId. That is pointless and a waste of 64-bits.

Closed

TINKERPOP-1128 Change the Gryo serialization for StarGraph (Vertex, Properties, then Edges)

Closed

TINKERPOP-1287 StarGraph has an overdose of Stream usage.

Closed

Activity

People

Assignee:: Unassigned

Reporter:: Marko A. Rodriguez

Votes:: 0 Vote for this issue

Watchers:: 3 Start watching this issue

Dates

Created:: 17/Jun/16 20:32

Updated:: 01/Mar/18 21:26

Resolved:: 01/Mar/18 21:26