Uploaded image for project: 'TinkerPop'
  1. TinkerPop
  2. TINKERPOP-1163

GraphComputer's can have TraversalStrategies.

    XMLWordPrintableJSON

    Details

    • Type: Improvement
    • Status: Closed
    • Priority: Major
    • Resolution: Implemented
    • Affects Version/s: 3.1.0-incubating
    • Fix Version/s: 3.2.0-incubating
    • Component/s: hadoop, process
    • Labels:
      None

      Description

      @dkuppitz makes the joke that he can count the number of vertices in the Friendster adjacency list with "awk to the sed to the bash to the.." in < 1 minute. SparkGraphComputer with four blades takes ~5 minutes.

      What's the dealio?

      Imagine a world where SparkGraphComputerStrategy exists. It analyzes traversals and does fast executions breaking away from the VertexProgram API and going strait to the native API of Spark. Check it:

      g.V().count() -> inputRDD.count()
      

      ...add a EmptyVertex.instance() manipulation to the respective InputFormats and you are just then skipping through bytes not manifesting objects at all. BAM. That would take 30 seconds on Friendster.

      g.V().outE('knows').count() --> inputRDD.flatMapToPair{edgeComponents}.filter{knows}.count()
      

      Blazing fast.

      ....for all those standard patterns, we just do a "native" execution for the respective GraphComputer engine. We sideStep object creation, iteration phases, views, map reduce jobs.... However, we have to be smart to update the Memory so it looks as if the real VertexProgram executed! — iteration, runtime, ~reducing, etc.

      Genius.

        Attachments

          Activity

            People

            • Assignee:
              okram Marko A. Rodriguez
              Reporter:
              okram Marko A. Rodriguez
            • Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: