Uploaded image for project: 'TinkerPop'
  1. TinkerPop
  2. TINKERPOP-1163

GraphComputer's can have TraversalStrategies.

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Closed
    • Major
    • Resolution: Implemented
    • 3.1.0-incubating
    • 3.2.0-incubating
    • hadoop, process
    • None

    Description

      @dkuppitz makes the joke that he can count the number of vertices in the Friendster adjacency list with "awk to the sed to the bash to the.." in < 1 minute. SparkGraphComputer with four blades takes ~5 minutes.

      What's the dealio?

      Imagine a world where SparkGraphComputerStrategy exists. It analyzes traversals and does fast executions breaking away from the VertexProgram API and going strait to the native API of Spark. Check it:

      g.V().count() -> inputRDD.count()
      

      ...add a EmptyVertex.instance() manipulation to the respective InputFormats and you are just then skipping through bytes not manifesting objects at all. BAM. That would take 30 seconds on Friendster.

      g.V().outE('knows').count() --> inputRDD.flatMapToPair{edgeComponents}.filter{knows}.count()
      

      Blazing fast.

      ....for all those standard patterns, we just do a "native" execution for the respective GraphComputer engine. We sideStep object creation, iteration phases, views, map reduce jobs.... However, we have to be smart to update the Memory so it looks as if the real VertexProgram executed! — iteration, runtime, ~reducing, etc.

      Genius.

      Attachments

        Activity

          People

            okram Marko A. Rodriguez
            okram Marko A. Rodriguez
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: