Details
-
Improvement
-
Status: Closed
-
Major
-
Resolution: Implemented
-
3.1.0-incubating
-
None
Description
@dkuppitz makes the joke that he can count the number of vertices in the Friendster adjacency list with "awk to the sed to the bash to the.." in < 1 minute. SparkGraphComputer with four blades takes ~5 minutes.
What's the dealio?
Imagine a world where SparkGraphComputerStrategy exists. It analyzes traversals and does fast executions breaking away from the VertexProgram API and going strait to the native API of Spark. Check it:
g.V().count() -> inputRDD.count()
...add a EmptyVertex.instance() manipulation to the respective InputFormats and you are just then skipping through bytes not manifesting objects at all. BAM. That would take 30 seconds on Friendster.
g.V().outE('knows').count() --> inputRDD.flatMapToPair{edgeComponents}.filter{knows}.count()
Blazing fast.
....for all those standard patterns, we just do a "native" execution for the respective GraphComputer engine. We sideStep object creation, iteration phases, views, map reduce jobs.... However, we have to be smart to update the Memory so it looks as if the real VertexProgram executed! — iteration, runtime, ~reducing, etc.
Genius.