Uploaded image for project: 'TinkerPop'
  1. TinkerPop
  2. TINKERPOP-1293

Implement GraphFilterStrategy as a default registration for GraphComputer

    Details

    • Type: Improvement
    • Status: Closed
    • Priority: Major
    • Resolution: Done
    • Affects Version/s: 3.2.0-incubating
    • Fix Version/s: 3.2.1
    • Component/s: process
    • Labels:
      None

      Description

      GraphFilterStrategy would be a TraversalStrategy for GraphComputers. It will inspect the Traversal and decide the computer.vertices(...).edges(...) to use.

      Some rules:

      1. If the traversal is part of an OLAP chain, don't apply strategy.
      2. If the traversal is persist edges, then don't edges(...) or vertices(...).
      3. If the traversal is persist vertex properties, then don't do vertices(...).

      Shouldn't be too difficult. Basically looking for:

      1. VertexStep labels for edges.
      2. HasStep containers for vertices and edges.

      From there, we can get fancy with outE().has("stars",gt(4))-style edges(...). However, basic element-label based filtering should be easy.

        Issue Links

          Activity

          Hide
          githubbot ASF GitHub Bot added a comment -

          GitHub user okram opened a pull request:

          https://github.com/apache/incubator-tinkerpop/pull/310

          TINKERPOP-1293: Implement GraphFilterStrategy as a default registration for GraphComputer

          https://issues.apache.org/jira/browse/TINKERPOP-1293

          `GraphFilterStrategy` (traversal optimization strategy) will create a graph filter for the OLAP traversal if one has not already been provided. This is made possible by `GraphFilterStrategy` analyzing the traversal to determine which edge directions/labels are being accessed and generating a traversal that meets that criteria. Along with this, `GraphFilter` has been advanced more intelligence, more test cases, more JavaDoc, and more helper methods for `GraphComputer` providers.

          `GraphFilterStrategy` is a default `GraphComputer` strategy that is deactivated by `TinkerGraphComputer` as edge filters are expensive in `TinkerGraphComputer` given that the entire graph is already loaded into memory and any filtering requires labeled edges as being filtered (or not).

          `mvn clean install`, Spark integration tests passing, and Giraph integration tests passing thus far...

          VOTE +1.

          You can merge this pull request into a Git repository by running:

          $ git pull https://github.com/apache/incubator-tinkerpop TINKERPOP-1293

          Alternatively you can review and apply these changes as the patch at:

          https://github.com/apache/incubator-tinkerpop/pull/310.patch

          To close this pull request, make a commit to your master/trunk branch
          with (at least) the following in the commit message:

          This closes #310



          Show
          githubbot ASF GitHub Bot added a comment - GitHub user okram opened a pull request: https://github.com/apache/incubator-tinkerpop/pull/310 TINKERPOP-1293 : Implement GraphFilterStrategy as a default registration for GraphComputer https://issues.apache.org/jira/browse/TINKERPOP-1293 `GraphFilterStrategy` (traversal optimization strategy) will create a graph filter for the OLAP traversal if one has not already been provided. This is made possible by `GraphFilterStrategy` analyzing the traversal to determine which edge directions/labels are being accessed and generating a traversal that meets that criteria. Along with this, `GraphFilter` has been advanced more intelligence, more test cases, more JavaDoc, and more helper methods for `GraphComputer` providers. `GraphFilterStrategy` is a default `GraphComputer` strategy that is deactivated by `TinkerGraphComputer` as edge filters are expensive in `TinkerGraphComputer` given that the entire graph is already loaded into memory and any filtering requires labeled edges as being filtered (or not). `mvn clean install`, Spark integration tests passing, and Giraph integration tests passing thus far... VOTE +1. You can merge this pull request into a Git repository by running: $ git pull https://github.com/apache/incubator-tinkerpop TINKERPOP-1293 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/incubator-tinkerpop/pull/310.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #310
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user okram commented on the pull request:

          https://github.com/apache/incubator-tinkerpop/pull/310#issuecomment-218266032

          CHANGELOG

          ```

          • `GraphFilter` helper methods are now more intelligent when determining edge direction/label legality.
          • Added `GraphFilterStrategy` to automatically construct `GraphFilters` via traversal introspection.
            ```

          UPDATE

          ```
          GraphFilter and GraphFilterStrategy
          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

          `GraphFilter` has been significantly advanced where the determination of an edge direction/label legality is more stringent. Along with this, `GraphFilter.getLegallyPositiveEdgeLabels()` has been added as a helper method to make it easier for providers to know the space of labels being accessed by the traversal.

          Note that `GraphFilterStrategy` is a default `TraversalStrategy` registered with `GraphComputer.` If `GraphFilter` is expensive for the underlying `GraphComputer` implementation, it can be deactivated accordingly as is done for `TinkerGraphComputer`.

          [source,java]


          static

          { TraversalStrategies.GlobalCache.registerStrategies(TinkerGraphComputer.class, TraversalStrategies.GlobalCache.getStrategies(GraphComputer.class).clone().removeStrategies(GraphFilterStrategy.class)); }

          ```

          Show
          githubbot ASF GitHub Bot added a comment - Github user okram commented on the pull request: https://github.com/apache/incubator-tinkerpop/pull/310#issuecomment-218266032 CHANGELOG ``` `GraphFilter` helper methods are now more intelligent when determining edge direction/label legality. Added `GraphFilterStrategy` to automatically construct `GraphFilters` via traversal introspection. ``` UPDATE ``` GraphFilter and GraphFilterStrategy ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ `GraphFilter` has been significantly advanced where the determination of an edge direction/label legality is more stringent. Along with this, `GraphFilter.getLegallyPositiveEdgeLabels()` has been added as a helper method to make it easier for providers to know the space of labels being accessed by the traversal. Note that `GraphFilterStrategy` is a default `TraversalStrategy` registered with `GraphComputer.` If `GraphFilter` is expensive for the underlying `GraphComputer` implementation, it can be deactivated accordingly as is done for `TinkerGraphComputer`. [source,java] static { TraversalStrategies.GlobalCache.registerStrategies(TinkerGraphComputer.class, TraversalStrategies.GlobalCache.getStrategies(GraphComputer.class).clone().removeStrategies(GraphFilterStrategy.class)); } ```
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user okram commented on the pull request:

          https://github.com/apache/incubator-tinkerpop/pull/310#issuecomment-218332211

          ```
          [INFO] ------------------------------------------------------------------------
          [INFO] Reactor Summary:
          [INFO]
          [INFO] Apache TinkerPop .................................. SUCCESS [5.532s]
          [INFO] Apache TinkerPop :: Gremlin Shaded ................ SUCCESS [2.355s]
          [INFO] Apache TinkerPop :: Gremlin Core .................. SUCCESS [44.087s]
          [INFO] Apache TinkerPop :: Gremlin Test .................. SUCCESS [11.803s]
          [INFO] Apache TinkerPop :: Gremlin Groovy ................ SUCCESS [40.912s]
          [INFO] Apache TinkerPop :: Gremlin Groovy Test ........... SUCCESS [7.801s]
          [INFO] Apache TinkerPop :: TinkerGraph Gremlin ........... SUCCESS [3:29.140s]
          [INFO] Apache TinkerPop :: Gremlin Benchmark ............. SUCCESS [6.440s]
          [INFO] Apache TinkerPop :: Hadoop Gremlin ................ SUCCESS [11:07.210s]
          [INFO] Apache TinkerPop :: Spark Gremlin ................. SUCCESS [6:18.138s]
          [INFO] Apache TinkerPop :: Giraph Gremlin ................ SUCCESS [2:43:57.096s]
          [INFO] Apache TinkerPop :: Neo4j Gremlin ................. SUCCESS [1:09:07.962s]
          [INFO] Apache TinkerPop :: Gremlin Driver ................ SUCCESS [12.497s]
          [INFO] Apache TinkerPop :: Gremlin Server ................ SUCCESS [13:22.778s]
          [INFO] Apache TinkerPop :: Gremlin Console ............... SUCCESS [1:26.498s]
          [INFO] Apache TinkerPop :: Gremlin Archetype ............. SUCCESS [0.161s]
          [INFO] Apache TinkerPop :: Archetype - TinkerGraph ....... SUCCESS [5.221s]
          [INFO] Apache TinkerPop :: Archetype - Server ............ SUCCESS [10.874s]
          [INFO] ------------------------------------------------------------------------
          [INFO] BUILD SUCCESS
          [INFO] ------------------------------------------------------------------------
          [INFO] Total time: 4:31:17.006s
          [INFO] Finished at: Tue May 10 17:55:05 MDT 2016
          [INFO] Final Memory: 105M/796M
          [INFO] ------------------------------------------------------------------------
          ```

          Show
          githubbot ASF GitHub Bot added a comment - Github user okram commented on the pull request: https://github.com/apache/incubator-tinkerpop/pull/310#issuecomment-218332211 ``` [INFO] ------------------------------------------------------------------------ [INFO] Reactor Summary: [INFO] [INFO] Apache TinkerPop .................................. SUCCESS [5.532s] [INFO] Apache TinkerPop :: Gremlin Shaded ................ SUCCESS [2.355s] [INFO] Apache TinkerPop :: Gremlin Core .................. SUCCESS [44.087s] [INFO] Apache TinkerPop :: Gremlin Test .................. SUCCESS [11.803s] [INFO] Apache TinkerPop :: Gremlin Groovy ................ SUCCESS [40.912s] [INFO] Apache TinkerPop :: Gremlin Groovy Test ........... SUCCESS [7.801s] [INFO] Apache TinkerPop :: TinkerGraph Gremlin ........... SUCCESS [3:29.140s] [INFO] Apache TinkerPop :: Gremlin Benchmark ............. SUCCESS [6.440s] [INFO] Apache TinkerPop :: Hadoop Gremlin ................ SUCCESS [11:07.210s] [INFO] Apache TinkerPop :: Spark Gremlin ................. SUCCESS [6:18.138s] [INFO] Apache TinkerPop :: Giraph Gremlin ................ SUCCESS [2:43:57.096s] [INFO] Apache TinkerPop :: Neo4j Gremlin ................. SUCCESS [1:09:07.962s] [INFO] Apache TinkerPop :: Gremlin Driver ................ SUCCESS [12.497s] [INFO] Apache TinkerPop :: Gremlin Server ................ SUCCESS [13:22.778s] [INFO] Apache TinkerPop :: Gremlin Console ............... SUCCESS [1:26.498s] [INFO] Apache TinkerPop :: Gremlin Archetype ............. SUCCESS [0.161s] [INFO] Apache TinkerPop :: Archetype - TinkerGraph ....... SUCCESS [5.221s] [INFO] Apache TinkerPop :: Archetype - Server ............ SUCCESS [10.874s] [INFO] ------------------------------------------------------------------------ [INFO] BUILD SUCCESS [INFO] ------------------------------------------------------------------------ [INFO] Total time: 4:31:17.006s [INFO] Finished at: Tue May 10 17:55:05 MDT 2016 [INFO] Final Memory: 105M/796M [INFO] ------------------------------------------------------------------------ ```
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user dkuppitz commented on the pull request:

          https://github.com/apache/incubator-tinkerpop/pull/310#issuecomment-218364465

          VOTE: +1

          Show
          githubbot ASF GitHub Bot added a comment - Github user dkuppitz commented on the pull request: https://github.com/apache/incubator-tinkerpop/pull/310#issuecomment-218364465 VOTE: +1
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user analytically commented on a diff in the pull request:

          https://github.com/apache/incubator-tinkerpop/pull/310#discussion_r62817317

          — Diff: tinkergraph-gremlin/src/main/java/org/apache/tinkerpop/gremlin/tinkergraph/process/computer/TinkerGraphComputer.java —
          @@ -54,6 +56,12 @@
          */
          public final class TinkerGraphComputer implements GraphComputer {

          + static {
          + // GraphFilters are expensive w/ TinkerGraphComputer as everything is already in memory
          — End diff –

          Would it be possible to explain this a bit more?

          Show
          githubbot ASF GitHub Bot added a comment - Github user analytically commented on a diff in the pull request: https://github.com/apache/incubator-tinkerpop/pull/310#discussion_r62817317 — Diff: tinkergraph-gremlin/src/main/java/org/apache/tinkerpop/gremlin/tinkergraph/process/computer/TinkerGraphComputer.java — @@ -54,6 +56,12 @@ */ public final class TinkerGraphComputer implements GraphComputer { + static { + // GraphFilters are expensive w/ TinkerGraphComputer as everything is already in memory — End diff – Would it be possible to explain this a bit more?
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user okram commented on a diff in the pull request:

          https://github.com/apache/incubator-tinkerpop/pull/310#discussion_r62851083

          — Diff: tinkergraph-gremlin/src/main/java/org/apache/tinkerpop/gremlin/tinkergraph/process/computer/TinkerGraphComputer.java —
          @@ -54,6 +56,12 @@
          */
          public final class TinkerGraphComputer implements GraphComputer {

          + static {
          + // GraphFilters are expensive w/ TinkerGraphComputer as everything is already in memory
          — End diff –

          `TinkerGraphComputer` already has the entire graph loaded in memory. Thus, when you add a `GraphFilter`, what I'm doing is creating a `Set<Object>` of edge IDs that are allowed. Then, every `Vertex.edges()` is filtered by `edge.id() in Set<Object>`. Its lame. As such, edge-based `GraphFilters` are expensive in `TinkerGraphComputer`.

          Show
          githubbot ASF GitHub Bot added a comment - Github user okram commented on a diff in the pull request: https://github.com/apache/incubator-tinkerpop/pull/310#discussion_r62851083 — Diff: tinkergraph-gremlin/src/main/java/org/apache/tinkerpop/gremlin/tinkergraph/process/computer/TinkerGraphComputer.java — @@ -54,6 +56,12 @@ */ public final class TinkerGraphComputer implements GraphComputer { + static { + // GraphFilters are expensive w/ TinkerGraphComputer as everything is already in memory — End diff – `TinkerGraphComputer` already has the entire graph loaded in memory. Thus, when you add a `GraphFilter`, what I'm doing is creating a `Set<Object>` of edge IDs that are allowed. Then, every `Vertex.edges()` is filtered by `edge.id() in Set<Object>`. Its lame. As such, edge-based `GraphFilters` are expensive in `TinkerGraphComputer`.
          Hide
          spmallette stephen mallette added a comment -

          This was merged a while back but never closed (guess the automation didn't work right)

          Show
          spmallette stephen mallette added a comment - This was merged a while back but never closed (guess the automation didn't work right)

            People

            • Assignee:
              okram Marko A. Rodriguez
              Reporter:
              okram Marko A. Rodriguez
            • Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development