Uploaded image for project: 'TinkerPop'
  1. TinkerPop
  2. TINKERPOP-962

Provide "vertex query" selectivity when importing data in OLAP.

    Details

    • Type: Improvement
    • Status: Closed
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: 3.1.0-incubating
    • Fix Version/s: 3.2.0-incubating
    • Component/s: process
    • Labels:

      Description

      Currently, when you do:

      graph.compute().program(PageRankVertexProgram).submit()
      

      We are pulling the entire graph into the OLAP engine. We should allow the user to limit the amount of data pulled via "vertex query"-type filter. For instance, we could support the following two new methods on GraphComputer.

      graph.compute().program(PageRankVertexProgram).vertices(hasLabel('person')).edges(out, hasLabel('knows','friend').has('weight',gt(0.8)).submit()
      

      The two methods would be defined as:

      public interface GraphComputer {
      ...
      GraphComputer vertices(final Traversal<Vertex,Vertex> vertexFilter)
      GraphComputer edges(final Direction direction, final Traversal<Edge,Edge> edgeFilter)
      

      If the user does NOT provide a vertices() (or edges()) call, then the Traversal is assumed to be IdentityTraversal. Finally, in terms of execution order, first vertices() is called and if "false" then don't call edges. Else, call edges on all the respective incoming and outgoing edges. Don't really like Direction there and perhaps its just:

      GraphComputer edges(final Traversal<Vertex,Edge> edgeFilter)
      

      And then all edges that pass through are added to OLAP vertex. You don't want both? Then its outE('knows',friend').has('weight',gt(0.8)).

        Activity

        Hide
        githubbot ASF GitHub Bot added a comment -

        Github user asfgit closed the pull request at:

        https://github.com/apache/incubator-tinkerpop/pull/210

        Show
        githubbot ASF GitHub Bot added a comment - Github user asfgit closed the pull request at: https://github.com/apache/incubator-tinkerpop/pull/210
        Hide
        githubbot ASF GitHub Bot added a comment -

        Github user twilmes commented on the pull request:

        https://github.com/apache/incubator-tinkerpop/pull/210#issuecomment-180705708

        Tests and code look good.

        VOTE: +1

        Show
        githubbot ASF GitHub Bot added a comment - Github user twilmes commented on the pull request: https://github.com/apache/incubator-tinkerpop/pull/210#issuecomment-180705708 Tests and code look good. VOTE: +1
        Hide
        githubbot ASF GitHub Bot added a comment -

        Github user dkuppitz commented on the pull request:

        https://github.com/apache/incubator-tinkerpop/pull/210#issuecomment-180660513

        • `mvn clean install`: passed
        • integration test (incl. Neo4j): passed

        VOTE: +1

        Show
        githubbot ASF GitHub Bot added a comment - Github user dkuppitz commented on the pull request: https://github.com/apache/incubator-tinkerpop/pull/210#issuecomment-180660513 `mvn clean install`: passed integration test (incl. Neo4j): passed VOTE: +1
        Hide
        githubbot ASF GitHub Bot added a comment -

        Github user okram commented on a diff in the pull request:

        https://github.com/apache/incubator-tinkerpop/pull/210#discussion_r52093310

        — Diff: gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/process/computer/GraphFilter.java —
        @@ -0,0 +1,193 @@
        +/*
        + * Licensed to the Apache Software Foundation (ASF) under one
        + * or more contributor license agreements. See the NOTICE file
        + * distributed with this work for additional information
        + * regarding copyright ownership. The ASF licenses this file
        + * to you under the Apache License, Version 2.0 (the
        + * "License"); you may not use this file except in compliance
        + * with the License. You may obtain a copy of the License at
        + *
        + * http://www.apache.org/licenses/LICENSE-2.0
        + *
        + * Unless required by applicable law or agreed to in writing,
        + * software distributed under the License is distributed on an
        + * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
        + * KIND, either express or implied. See the License for the
        + * specific language governing permissions and limitations
        + * under the License.
        + */
        +
        +package org.apache.tinkerpop.gremlin.process.computer;
        +
        +import org.apache.tinkerpop.gremlin.process.traversal.Traversal;
        +import org.apache.tinkerpop.gremlin.process.traversal.step.filter.RangeGlobalStep;
        +import org.apache.tinkerpop.gremlin.process.traversal.step.map.VertexStep;
        +import org.apache.tinkerpop.gremlin.process.traversal.util.TraversalHelper;
        +import org.apache.tinkerpop.gremlin.process.traversal.util.TraversalUtil;
        +import org.apache.tinkerpop.gremlin.structure.Direction;
        +import org.apache.tinkerpop.gremlin.structure.Edge;
        +import org.apache.tinkerpop.gremlin.structure.Vertex;
        +
        +import java.io.Serializable;
        +import java.util.Arrays;
        +import java.util.HashSet;
        +import java.util.Iterator;
        +import java.util.Set;
        +
        +/**
        + * GraphFilter is used by

        {@link GraphComputer} implementations to prune the source graph data being loaded into the OLAP system.
        + * There are two types of filters: a {@link Vertex} filter and an {@link Edge} filter.
        + * The vertex filter is a {@link Traversal} that can only check the id, label, and properties of the vertex.
        + * The edge filter is a {@link Traversal} that starts at the vertex are emits all legal incident edges.
        + * If no vertex filter is provided, then no vertices are filtered. If no edge filter is provided, then no edges are filtered.
        + * The use of a GraphFilter can greatly reduce the amount of data processed by the {@link GraphComputer}

        .
        + * For instance, for

        {@code g.V().count()}

        , there is no reason to load edges, and thus, the edge filter can be

        {@code bothE().limit(0)}

        .
        + *
        + * @author Marko A. Rodriguez (http://markorodriguez.com)
        + */
        +public final class GraphFilter implements Cloneable, Serializable {
        +
        + public enum Legal {
        + YES, NO, MAYBE;
        +
        + public boolean positive()

        { + return this != NO; + }

        +
        + public boolean negative()

        { + return this == NO; + }

        + }
        +
        + private Traversal.Admin<Vertex, Vertex> vertexFilter = null;
        + private Traversal.Admin<Vertex, Edge> edgeFilter = null;
        +
        + private boolean allowNoEdges = false;
        + private Direction allowedEdgeDirection = Direction.BOTH;
        + private Set<String> allowedEdgeLabels = new HashSet<>();
        + //private boolean allowAllRemainingEdges = false;
        +
        + public void setVertexFilter(final Traversal<Vertex, Vertex> vertexFilter)

        { + if (!TraversalHelper.isLocalVertex(vertexFilter.asAdmin())) + throw GraphComputer.Exceptions.vertexFilterAccessesIncidentEdges(vertexFilter); + this.vertexFilter = vertexFilter.asAdmin().clone(); + }

        +
        + public void setEdgeFilter(final Traversal<Vertex, Edge> edgeFilter) {
        + if (!TraversalHelper.isLocalStarGraph(edgeFilter.asAdmin()))
        + throw GraphComputer.Exceptions.edgeFilterAccessesAdjacentVertices(edgeFilter);
        + this.edgeFilter = edgeFilter.asAdmin().clone();
        + if (this.edgeFilter.getEndStep() instanceof RangeGlobalStep && 0 == ((RangeGlobalStep) this.edgeFilter.getEndStep()).getHighRange())
        + this.allowNoEdges = true;
        + else if (this.edgeFilter.getStartStep() instanceof VertexStep)

        { + this.allowedEdgeLabels.clear(); + this.allowedEdgeLabels.addAll(Arrays.asList(((VertexStep) this.edgeFilter.getStartStep()).getEdgeLabels())); + this.allowedEdgeDirection = ((VertexStep) this.edgeFilter.getStartStep()).getDirection(); + //this.allowAllRemainingEdges = 1 == this.edgeFilter.getSteps().size(); + }

        + }
        +
        + /*public void compileFilters() {
        — End diff –

        I was using `compileFilters()` at first, but then didn't need them. I left it commented out just in case its something that might pop up in the future. `GraphFilter` is very new and over this 3.2.0 push, I'm sure we will tweak here and there. Just want people to know its around.

        Regarding the `allowRemainingEdges`. Again, its something I didn't need, but can easily add if we find a good use case.

        Show
        githubbot ASF GitHub Bot added a comment - Github user okram commented on a diff in the pull request: https://github.com/apache/incubator-tinkerpop/pull/210#discussion_r52093310 — Diff: gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/process/computer/GraphFilter.java — @@ -0,0 +1,193 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, + * software distributed under the License is distributed on an + * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + * KIND, either express or implied. See the License for the + * specific language governing permissions and limitations + * under the License. + */ + +package org.apache.tinkerpop.gremlin.process.computer; + +import org.apache.tinkerpop.gremlin.process.traversal.Traversal; +import org.apache.tinkerpop.gremlin.process.traversal.step.filter.RangeGlobalStep; +import org.apache.tinkerpop.gremlin.process.traversal.step.map.VertexStep; +import org.apache.tinkerpop.gremlin.process.traversal.util.TraversalHelper; +import org.apache.tinkerpop.gremlin.process.traversal.util.TraversalUtil; +import org.apache.tinkerpop.gremlin.structure.Direction; +import org.apache.tinkerpop.gremlin.structure.Edge; +import org.apache.tinkerpop.gremlin.structure.Vertex; + +import java.io.Serializable; +import java.util.Arrays; +import java.util.HashSet; +import java.util.Iterator; +import java.util.Set; + +/** + * GraphFilter is used by {@link GraphComputer} implementations to prune the source graph data being loaded into the OLAP system. + * There are two types of filters: a {@link Vertex} filter and an {@link Edge} filter. + * The vertex filter is a {@link Traversal} that can only check the id, label, and properties of the vertex. + * The edge filter is a {@link Traversal} that starts at the vertex are emits all legal incident edges. + * If no vertex filter is provided, then no vertices are filtered. If no edge filter is provided, then no edges are filtered. + * The use of a GraphFilter can greatly reduce the amount of data processed by the {@link GraphComputer} . + * For instance, for {@code g.V().count()} , there is no reason to load edges, and thus, the edge filter can be {@code bothE().limit(0)} . + * + * @author Marko A. Rodriguez ( http://markorodriguez.com ) + */ +public final class GraphFilter implements Cloneable, Serializable { + + public enum Legal { + YES, NO, MAYBE; + + public boolean positive() { + return this != NO; + } + + public boolean negative() { + return this == NO; + } + } + + private Traversal.Admin<Vertex, Vertex> vertexFilter = null; + private Traversal.Admin<Vertex, Edge> edgeFilter = null; + + private boolean allowNoEdges = false; + private Direction allowedEdgeDirection = Direction.BOTH; + private Set<String> allowedEdgeLabels = new HashSet<>(); + //private boolean allowAllRemainingEdges = false; + + public void setVertexFilter(final Traversal<Vertex, Vertex> vertexFilter) { + if (!TraversalHelper.isLocalVertex(vertexFilter.asAdmin())) + throw GraphComputer.Exceptions.vertexFilterAccessesIncidentEdges(vertexFilter); + this.vertexFilter = vertexFilter.asAdmin().clone(); + } + + public void setEdgeFilter(final Traversal<Vertex, Edge> edgeFilter) { + if (!TraversalHelper.isLocalStarGraph(edgeFilter.asAdmin())) + throw GraphComputer.Exceptions.edgeFilterAccessesAdjacentVertices(edgeFilter); + this.edgeFilter = edgeFilter.asAdmin().clone(); + if (this.edgeFilter.getEndStep() instanceof RangeGlobalStep && 0 == ((RangeGlobalStep) this.edgeFilter.getEndStep()).getHighRange()) + this.allowNoEdges = true; + else if (this.edgeFilter.getStartStep() instanceof VertexStep) { + this.allowedEdgeLabels.clear(); + this.allowedEdgeLabels.addAll(Arrays.asList(((VertexStep) this.edgeFilter.getStartStep()).getEdgeLabels())); + this.allowedEdgeDirection = ((VertexStep) this.edgeFilter.getStartStep()).getDirection(); + //this.allowAllRemainingEdges = 1 == this.edgeFilter.getSteps().size(); + } + } + + /*public void compileFilters() { — End diff – I was using `compileFilters()` at first, but then didn't need them. I left it commented out just in case its something that might pop up in the future. `GraphFilter` is very new and over this 3.2.0 push, I'm sure we will tweak here and there. Just want people to know its around. Regarding the `allowRemainingEdges`. Again, its something I didn't need, but can easily add if we find a good use case.
        Hide
        githubbot ASF GitHub Bot added a comment -

        Github user twilmes commented on a diff in the pull request:

        https://github.com/apache/incubator-tinkerpop/pull/210#discussion_r52088164

        — Diff: gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/process/computer/GraphFilter.java —
        @@ -0,0 +1,193 @@
        +/*
        + * Licensed to the Apache Software Foundation (ASF) under one
        + * or more contributor license agreements. See the NOTICE file
        + * distributed with this work for additional information
        + * regarding copyright ownership. The ASF licenses this file
        + * to you under the Apache License, Version 2.0 (the
        + * "License"); you may not use this file except in compliance
        + * with the License. You may obtain a copy of the License at
        + *
        + * http://www.apache.org/licenses/LICENSE-2.0
        + *
        + * Unless required by applicable law or agreed to in writing,
        + * software distributed under the License is distributed on an
        + * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
        + * KIND, either express or implied. See the License for the
        + * specific language governing permissions and limitations
        + * under the License.
        + */
        +
        +package org.apache.tinkerpop.gremlin.process.computer;
        +
        +import org.apache.tinkerpop.gremlin.process.traversal.Traversal;
        +import org.apache.tinkerpop.gremlin.process.traversal.step.filter.RangeGlobalStep;
        +import org.apache.tinkerpop.gremlin.process.traversal.step.map.VertexStep;
        +import org.apache.tinkerpop.gremlin.process.traversal.util.TraversalHelper;
        +import org.apache.tinkerpop.gremlin.process.traversal.util.TraversalUtil;
        +import org.apache.tinkerpop.gremlin.structure.Direction;
        +import org.apache.tinkerpop.gremlin.structure.Edge;
        +import org.apache.tinkerpop.gremlin.structure.Vertex;
        +
        +import java.io.Serializable;
        +import java.util.Arrays;
        +import java.util.HashSet;
        +import java.util.Iterator;
        +import java.util.Set;
        +
        +/**
        + * GraphFilter is used by

        {@link GraphComputer} implementations to prune the source graph data being loaded into the OLAP system.
        + * There are two types of filters: a {@link Vertex} filter and an {@link Edge} filter.
        + * The vertex filter is a {@link Traversal} that can only check the id, label, and properties of the vertex.
        + * The edge filter is a {@link Traversal} that starts at the vertex are emits all legal incident edges.
        + * If no vertex filter is provided, then no vertices are filtered. If no edge filter is provided, then no edges are filtered.
        + * The use of a GraphFilter can greatly reduce the amount of data processed by the {@link GraphComputer}

        .
        + * For instance, for

        {@code g.V().count()}

        , there is no reason to load edges, and thus, the edge filter can be

        {@code bothE().limit(0)}

        .
        + *
        + * @author Marko A. Rodriguez (http://markorodriguez.com)
        + */
        +public final class GraphFilter implements Cloneable, Serializable {
        +
        + public enum Legal {
        + YES, NO, MAYBE;
        +
        + public boolean positive()

        { + return this != NO; + }

        +
        + public boolean negative()

        { + return this == NO; + }

        + }
        +
        + private Traversal.Admin<Vertex, Vertex> vertexFilter = null;
        + private Traversal.Admin<Vertex, Edge> edgeFilter = null;
        +
        + private boolean allowNoEdges = false;
        + private Direction allowedEdgeDirection = Direction.BOTH;
        + private Set<String> allowedEdgeLabels = new HashSet<>();
        + //private boolean allowAllRemainingEdges = false;
        +
        + public void setVertexFilter(final Traversal<Vertex, Vertex> vertexFilter)

        { + if (!TraversalHelper.isLocalVertex(vertexFilter.asAdmin())) + throw GraphComputer.Exceptions.vertexFilterAccessesIncidentEdges(vertexFilter); + this.vertexFilter = vertexFilter.asAdmin().clone(); + }

        +
        + public void setEdgeFilter(final Traversal<Vertex, Edge> edgeFilter) {
        + if (!TraversalHelper.isLocalStarGraph(edgeFilter.asAdmin()))
        + throw GraphComputer.Exceptions.edgeFilterAccessesAdjacentVertices(edgeFilter);
        + this.edgeFilter = edgeFilter.asAdmin().clone();
        + if (this.edgeFilter.getEndStep() instanceof RangeGlobalStep && 0 == ((RangeGlobalStep) this.edgeFilter.getEndStep()).getHighRange())
        + this.allowNoEdges = true;
        + else if (this.edgeFilter.getStartStep() instanceof VertexStep)

        { + this.allowedEdgeLabels.clear(); + this.allowedEdgeLabels.addAll(Arrays.asList(((VertexStep) this.edgeFilter.getStartStep()).getEdgeLabels())); + this.allowedEdgeDirection = ((VertexStep) this.edgeFilter.getStartStep()).getDirection(); + //this.allowAllRemainingEdges = 1 == this.edgeFilter.getSteps().size(); + }

        + }
        +
        + /*public void compileFilters() {
        — End diff –

        Did you mean to comment this out and remove calls to it? Figure you meant to but I recognized it from a previous commit.

        Show
        githubbot ASF GitHub Bot added a comment - Github user twilmes commented on a diff in the pull request: https://github.com/apache/incubator-tinkerpop/pull/210#discussion_r52088164 — Diff: gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/process/computer/GraphFilter.java — @@ -0,0 +1,193 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, + * software distributed under the License is distributed on an + * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + * KIND, either express or implied. See the License for the + * specific language governing permissions and limitations + * under the License. + */ + +package org.apache.tinkerpop.gremlin.process.computer; + +import org.apache.tinkerpop.gremlin.process.traversal.Traversal; +import org.apache.tinkerpop.gremlin.process.traversal.step.filter.RangeGlobalStep; +import org.apache.tinkerpop.gremlin.process.traversal.step.map.VertexStep; +import org.apache.tinkerpop.gremlin.process.traversal.util.TraversalHelper; +import org.apache.tinkerpop.gremlin.process.traversal.util.TraversalUtil; +import org.apache.tinkerpop.gremlin.structure.Direction; +import org.apache.tinkerpop.gremlin.structure.Edge; +import org.apache.tinkerpop.gremlin.structure.Vertex; + +import java.io.Serializable; +import java.util.Arrays; +import java.util.HashSet; +import java.util.Iterator; +import java.util.Set; + +/** + * GraphFilter is used by {@link GraphComputer} implementations to prune the source graph data being loaded into the OLAP system. + * There are two types of filters: a {@link Vertex} filter and an {@link Edge} filter. + * The vertex filter is a {@link Traversal} that can only check the id, label, and properties of the vertex. + * The edge filter is a {@link Traversal} that starts at the vertex are emits all legal incident edges. + * If no vertex filter is provided, then no vertices are filtered. If no edge filter is provided, then no edges are filtered. + * The use of a GraphFilter can greatly reduce the amount of data processed by the {@link GraphComputer} . + * For instance, for {@code g.V().count()} , there is no reason to load edges, and thus, the edge filter can be {@code bothE().limit(0)} . + * + * @author Marko A. Rodriguez ( http://markorodriguez.com ) + */ +public final class GraphFilter implements Cloneable, Serializable { + + public enum Legal { + YES, NO, MAYBE; + + public boolean positive() { + return this != NO; + } + + public boolean negative() { + return this == NO; + } + } + + private Traversal.Admin<Vertex, Vertex> vertexFilter = null; + private Traversal.Admin<Vertex, Edge> edgeFilter = null; + + private boolean allowNoEdges = false; + private Direction allowedEdgeDirection = Direction.BOTH; + private Set<String> allowedEdgeLabels = new HashSet<>(); + //private boolean allowAllRemainingEdges = false; + + public void setVertexFilter(final Traversal<Vertex, Vertex> vertexFilter) { + if (!TraversalHelper.isLocalVertex(vertexFilter.asAdmin())) + throw GraphComputer.Exceptions.vertexFilterAccessesIncidentEdges(vertexFilter); + this.vertexFilter = vertexFilter.asAdmin().clone(); + } + + public void setEdgeFilter(final Traversal<Vertex, Edge> edgeFilter) { + if (!TraversalHelper.isLocalStarGraph(edgeFilter.asAdmin())) + throw GraphComputer.Exceptions.edgeFilterAccessesAdjacentVertices(edgeFilter); + this.edgeFilter = edgeFilter.asAdmin().clone(); + if (this.edgeFilter.getEndStep() instanceof RangeGlobalStep && 0 == ((RangeGlobalStep) this.edgeFilter.getEndStep()).getHighRange()) + this.allowNoEdges = true; + else if (this.edgeFilter.getStartStep() instanceof VertexStep) { + this.allowedEdgeLabels.clear(); + this.allowedEdgeLabels.addAll(Arrays.asList(((VertexStep) this.edgeFilter.getStartStep()).getEdgeLabels())); + this.allowedEdgeDirection = ((VertexStep) this.edgeFilter.getStartStep()).getDirection(); + //this.allowAllRemainingEdges = 1 == this.edgeFilter.getSteps().size(); + } + } + + /*public void compileFilters() { — End diff – Did you mean to comment this out and remove calls to it? Figure you meant to but I recognized it from a previous commit.
        Hide
        githubbot ASF GitHub Bot added a comment -

        Github user okram commented on the pull request:

        https://github.com/apache/incubator-tinkerpop/pull/210#issuecomment-180594111

        The different between no `GraphFilter` and `GraphFilter`.

        ```
        Application ID Name Cores Memory per Node Submitted Time User State Duration
        app-20160205091408-0001 Apache TinkerPop's Spark-Gremlin 80 10.0 GB 2016/02/05 09:14:08 tinkerpop FINISHED 13 min
        app-20160205090220-0000 Apache TinkerPop's Spark-Gremlin 80 10.0 GB 2016/02/05 09:02:20 tinkerpop FINISHED 6.8 min
        ```

        Show
        githubbot ASF GitHub Bot added a comment - Github user okram commented on the pull request: https://github.com/apache/incubator-tinkerpop/pull/210#issuecomment-180594111 The different between no `GraphFilter` and `GraphFilter`. ``` Application ID Name Cores Memory per Node Submitted Time User State Duration app-20160205091408-0001 Apache TinkerPop's Spark-Gremlin 80 10.0 GB 2016/02/05 09:14:08 tinkerpop FINISHED 13 min app-20160205090220-0000 Apache TinkerPop's Spark-Gremlin 80 10.0 GB 2016/02/05 09:02:20 tinkerpop FINISHED 6.8 min ```
        Hide
        githubbot ASF GitHub Bot added a comment -

        GitHub user okram opened a pull request:

        https://github.com/apache/incubator-tinkerpop/pull/210

        TINKERPOP-962: Provide "vertex query" selectivity when importing data in OLAP.

        TINKERPOP-962: Provide "vertex query" selectivity when importing data in OLAP.

        https://issues.apache.org/jira/browse/TINKERPOP-962

        (For TinkerPop 3.2.0 – Breaking Change for GraphComputer Implementations)

        This feature enables us to push down a `GraphFilter` predicate to the underlying OLAP graph system. For instance, if `g.V().count()` is executed by `SparkGraphComputer`, then there is no reason to load all the edges, simply push down a `GraphFilter`-predicate that filters out edges. For graph database providers like Titan, they can simply only send up the subset of the graph that is required for the OLAP job instead of filtering on the OLAP cluster machines. In the future, we will provide `GraphFilterTraversalStrategy` which will analyze the traversal and automatically generate a `GraphFilter` so the user is blind to which subsets of the full graph are actually being accessed by the OLAP engine.

        This pull request yields a breaking change for graph system providers that have their own `GraphComputer` implementation. There are two new methods on `GraphComputer` and one new method on `GraphReader`.

        ```
        GraphComputer vertices(Traversal<Vertex,Vertex> vertexFilter)
        GraphComputer edges(Traversal<Vertex,Edge> edgeFilter)
        GraphReader.readVertex(InputStream inputStream, GraphFilter graphFilter)
        ```

        TinkerPop provides a `GraphFilter` object that does a lot of the heavy lifting so at minimum, the graph system provider simply needs to `GraphFilter.isLegal()` the vertices and edges it loads. Note that if the graph system provider relies on `GiraphGraphComputer` or `SparkGraphComputer`, then there is no change on their part unless they want to leverage the `GraphFilter` locally before sending their data to Giraph or Spark (an optimization that can be done at a later date without impacting users).

        There was a host of changes that took place for this feature to be created. When merged, the `CHANGELOG.txt` will have the following new items:

        ```

        • Added `GraphFilter` to support filtering out vertices and edges that won't be touched by an OLAP job.
        • Added `GraphComputer.vertices()` and `GraphComputer.edges()` for `GraphFilter` construction (breaking).
        • `SparkGraphComputer`, `GiraphGraphComputer`, and `TinkerGraphComputer` all support `GraphFilter`.
        • Added `GraphComputerTest.shouldSupportGraphFilter()` which verifies all filtered graphs have the same topology.
        • Added `GraphFilterAware` interface to `hadoop-gremlin/` which tells the OLAP engine that the `InputFormat` handles filtering.
        • `GryoInputFormat` and `ScriptInputFormat` all implement `GraphFilterAware`.
        • Fixed a bug in `TraversalUtil.isLocalStarGraph()` which allowed certain illegal traversals to pass.
        • Added `TraversalUtil.isLocalVertex()` to verify that the traversal does not touch incident edges.
        • `GraphReader` IO interface now has `Optional<Vertex> readGraph(InputStream, GraphFilter)`. Default `UnsupportOperationException`.
        • `GryoReader` does not materialize edges that will be filtered out and this greatly reduces GC and load times.
        • Created custom `Serializers` for `SparkGraphComputer` message-passing classes which reduce graph sizes significantly.
          ```

        Ran `mvn clean install` and integration tests. Passed.

        VOTE +1.

        You can merge this pull request into a Git repository by running:

        $ git pull https://github.com/apache/incubator-tinkerpop TINKERPOP-962

        Alternatively you can review and apply these changes as the patch at:

        https://github.com/apache/incubator-tinkerpop/pull/210.patch

        To close this pull request, make a commit to your master/trunk branch
        with (at least) the following in the commit message:

        This closes #210


        commit 873174e8218aef31f2220928ab16463aeda650cd
        Author: Marko A. Rodriguez <okrammarko@gmail.com>
        Date: 2016-02-01T16:29:14Z

        Started working on GraphComputer.vertices() and GraphComputer.edges(). Have it working (untested) for SparkGraphComputer. The same pattern will flow over to GiraphGraphComputer. There are some issues regarding semantics in TinkerGraphComputer. Will bring up with a [DISCUSS].

        commit 3b3e008ce03d1f63610b92ff79886376d9dc55f7
        Author: Marko A. Rodriguez <okrammarko@gmail.com>
        Date: 2016-02-01T19:38:42Z

        GraphComputerTest now verifies that graph filters work – GraphComputer.vertices() and GraphComputer.edges(). SparkGraphComputer implements graph filters correctly. TinkerGraph and Giraph throw UnsupportOperationException at this point (i.e. TODO). Had to add remove() methods to many of the inner Iterator anonymous classes in IteratorUtils and MultiIterator. Basically, they just call remove() on the wrapped iterator. Thus, cleanly backwards compatible. Added GraphFilterAware interface will allow InputFormats to say whether or not they do vertex/edge-filtering on graph load. Nothing connected to that yet, but GryoInputFormat (and smart providers) will be able to leverage this interface. Still a work in progress....

        commit 3485d8454855938fd7c0c24d5c3f9c3eb6ab308a
        Author: Marko A. Rodriguez <okrammarko@gmail.com>
        Date: 2016-02-01T21:45:48Z

        Created a CommonFileInputFormat abstract class that both GryoInputFormat and ScriptInputFormat now extend. It handles all vertex/edge filter construction and has helper methods for filtering the StarVertex prior to being fully loaded by the InputFormat. This is really nice as we can now tweak vertex loading to a pretty intense degree especially with GryoInputFormat (e.g. once properties are loaded, check vertex filter and thus, don't even deserialize the edges). How it is right now, the full Vertex is materialized, then validated before the InputFormat will nextKeyValue().

        commit 77732ddd5f60bbd65a445390e590da34bea1db2f
        Author: Marko A. Rodriguez <okrammarko@gmail.com>
        Date: 2016-02-01T21:59:04Z

        tweaks to filtered boolean check.

        commit 64c684065143b75697ccac755b9dfbf943c8c54c
        Author: Marko A. Rodriguez <okrammarko@gmail.com>
        Date: 2016-02-01T23:46:07Z

        GiraphGraphComputer now has support for vertexFilters and edgeFilters. Consolidated a bunch of code to make it easy for future InputFormats to be GraphFilterAware. Will most likely make a filterMap so variables are bundled nicely.

        commit bc417dbf01fee817aa325ee8e4b582fef8ab6788
        Author: Marko A. Rodriguez <okrammarko@gmail.com>
        Date: 2016-02-02T00:44:19Z

        created a GraphFilter container object that makes storing and applying filters easy. Very clean model. GraphFilter will next contain stuff like inferences on the filters so easy push-down predicates are available to the graph system provider.

        commit d0ac65277702b703c1ab2257adcbf67b0699b959
        Author: Marko A. Rodriguez <okrammarko@gmail.com>
        Date: 2016-02-02T15:55:14Z

        GraphFilter is now a really cool class. It is part of gremlin-core/computer and provides access to GraphComputer vertices() and edges() load filters. It also provides direct support for filtering StarVertex vertices (as most OLAP systems will leverage StarVertex). Its StarVertex support is nice in that GraphFilter analyzes the edgesFilter and can do bulk dropEdges() to prune the StarVertex fast. Whatever it can't do in bulk, it then runs the edgeFilter over the remaining edges. GraphComputerTest.shouldSupportGraphFilter() ensures that the graph is properly pruned. I have some ideas about pushing GraphFilter down to the StarVertex deserializer, but will need @spmallette help on that. If we can do that, then we can get some BLAZING speeds for highly pruned OLAP operations.

        commit eee16c9354602a49e7dfb7738f2ce4d9fe36152c
        Author: Marko A. Rodriguez <okrammarko@gmail.com>
        Date: 2016-02-02T19:17:59Z

        TinkerGraph now supports GraphComputer GraphFilter. Sort of an elegant solution that makes use of tagging elements that are legal or not. As of right now, the full test suite passes (integration too). GraphFilter works – this is going to be huge for speeding up OLAP times.

        commit 7ad48f20586ec58b1fea7018fa8f37ec8c95c9b9
        Author: Marko A. Rodriguez <okrammarko@gmail.com>
        Date: 2016-02-02T19:44:57Z

        added a MapReduce test. We now verify that GraphFilter works for both VertexProgram+MapReduce and MapReduce only. TinkerGraph and Spark integration tests pass.

        commit 72e388c4a1eadb6654a422988857006ed27b6158
        Author: Marko A. Rodriguez <okrammarko@gmail.com>
        Date: 2016-02-02T19:53:54Z

        added nice GraphFilter.legalVertex() and GraphFilter.legalEdges() methods so that the provider doesn't have to be smart about how to apply the underlying filter traversal.

        commit e4cf925b496ee250f7dca48d094e8b93816ca075
        Author: Marko A. Rodriguez <okrammarko@gmail.com>
        Date: 2016-02-02T20:24:45Z

        Added a state-based test case to GraphFilter. About to run this thing on the Blade cluster against Friendster to see how well we do now.

        commit 7023987a0b5154646a4d77e0b2b3506e850ed3d2
        Author: Marko A. Rodriguez <okrammarko@gmail.com>
        Date: 2016-02-02T20:35:27Z

        Forgot to add vertices() and edges() to the ComputerTraversalEngine.Builder. I can't wait for this model to go away in favor of a fluent TraversalSource.

        commit 6cfb1f22f43fa82be10d04fc28e86e8f3db9d28e
        Author: Marko A. Rodriguez <okrammarko@gmail.com>
        Date: 2016-02-02T21:24:38Z

        found a bug in TraversalUtil.isLocalStarGraph(). Added TraversalUtil.isLocalVertex() (for only checking properties – no edge access). Added JavaDoc to new GraphComputer methods. Added verfication that the provided traversals don't leave their respective boundaries.

        commit f7ad5c4f6a7b197cebb86fa22d4c263ce6b3365b
        Author: Marko A. Rodriguez <okrammarko@gmail.com>
        Date: 2016-02-02T21:49:22Z

        Added standard GraphComputer.Exceptions for GraphFilter and verfiy Exceptions are thrown correctly in GraphComputerTest. Tweaks to JavaDoc.

        commit b824d0c0994276e3714dc59341aa24526127eafe
        Author: Marko A. Rodriguez <okrammarko@gmail.com>
        Date: 2016-02-03T18:03:23Z

        Created specialized serializers for common classes in Spark to avoid the overhead of JavaSerialization.

        commit 4afe29a80fb15f965924297fafda942adeb36b06
        Author: Marko A. Rodriguez <okrammarko@gmail.com>
        Date: 2016-02-03T18:09:46Z

        forgot a Serialization that popped up when taking things to the cluster.

        commit 1c9a31c4c3d3f09c829d135363ad7ebff6590c8d
        Author: Marko A. Rodriguez <okrammarko@gmail.com>
        Date: 2016-02-03T19:49:53Z

        Learned about ExternalizableSerializer which makes registration of Kryo serializers alot more simple. Ran this code on the cluster – what took 25 minutes now takes 6.8 minutes.

        commit 097e09a39a151e6dbb8ebb268bc1792baac8765a
        Author: Marko A. Rodriguez <okrammarko@gmail.com>
        Date: 2016-02-03T20:48:31Z

        minor nothings.

        commit 001a13dec5d3bb7ffa269fa2e392947d5c600a5e
        Author: Marko A. Rodriguez <okrammarko@gmail.com>
        Date: 2016-02-03T21:28:59Z

        Merge branch 'master' into TINKERPOP-962

        commit 569496f671f4e532fc459cee54da3e6e62522ac1
        Author: Marko A. Rodriguez <okrammarko@gmail.com>
        Date: 2016-02-04T14:36:31Z

        Merge branch 'master' into TINKERPOP-962

        commit 07f7a8c614493de4bd13d2e75292609c5ee7183c
        Author: Marko A. Rodriguez <okrammarko@gmail.com>
        Date: 2016-02-04T17:42:24Z

        Moved GraphFilterTest to gremlin-groovy/ so I can use reflection and not have to make internal variables protected for testing purposes. Optional<Vertex> GraphReader.readVertex(InputStream,GraphFilter) now exists at the interface level with an UnsupportedOperationException default. GryoReader can now read vertices from a GraphFilter-perspective and only materialize those vertices/edges that are legal. Should be fairly trivial to add to GraphSONReader.

        commit b3d3116e5f287e61d44993af3e709c7d04bf77ac
        Author: Marko A. Rodriguez <okrammarko@gmail.com>
        Date: 2016-02-04T20:00:00Z

        was using null to represent a filtered vertex. went with Optional throughout so the API is consistent.

        commit a28b1fdc673bb6a11b741d306e3706efc4510592
        Author: Marko A. Rodriguez <okrammarko@gmail.com>
        Date: 2016-02-04T20:23:12Z

        method rename. pointless twiddling.

        commit 25e5b24049ef22d1bb64ae652d6ff5cba4786451
        Author: Marko A. Rodriguez <okrammarko@gmail.com>
        Date: 2016-02-04T20:51:12Z

        ensure that the context is closed after the test suite has completed.

        commit ed18cd9382ee1e2db7f4618a72e9d28ed6b2fb2a
        Author: Marko A. Rodriguez <okrammarko@gmail.com>
        Date: 2016-02-04T22:51:08Z

        OMG, the most insane bug for the last two hours. Painfull......


        Show
        githubbot ASF GitHub Bot added a comment - GitHub user okram opened a pull request: https://github.com/apache/incubator-tinkerpop/pull/210 TINKERPOP-962 : Provide "vertex query" selectivity when importing data in OLAP. TINKERPOP-962 : Provide "vertex query" selectivity when importing data in OLAP. https://issues.apache.org/jira/browse/TINKERPOP-962 (For TinkerPop 3.2.0 – Breaking Change for GraphComputer Implementations) This feature enables us to push down a `GraphFilter` predicate to the underlying OLAP graph system. For instance, if `g.V().count()` is executed by `SparkGraphComputer`, then there is no reason to load all the edges, simply push down a `GraphFilter`-predicate that filters out edges. For graph database providers like Titan, they can simply only send up the subset of the graph that is required for the OLAP job instead of filtering on the OLAP cluster machines. In the future, we will provide `GraphFilterTraversalStrategy` which will analyze the traversal and automatically generate a `GraphFilter` so the user is blind to which subsets of the full graph are actually being accessed by the OLAP engine. This pull request yields a breaking change for graph system providers that have their own `GraphComputer` implementation. There are two new methods on `GraphComputer` and one new method on `GraphReader`. ``` GraphComputer vertices(Traversal<Vertex,Vertex> vertexFilter) GraphComputer edges(Traversal<Vertex,Edge> edgeFilter) GraphReader.readVertex(InputStream inputStream, GraphFilter graphFilter) ``` TinkerPop provides a `GraphFilter` object that does a lot of the heavy lifting so at minimum, the graph system provider simply needs to `GraphFilter.isLegal()` the vertices and edges it loads. Note that if the graph system provider relies on `GiraphGraphComputer` or `SparkGraphComputer`, then there is no change on their part unless they want to leverage the `GraphFilter` locally before sending their data to Giraph or Spark (an optimization that can be done at a later date without impacting users). There was a host of changes that took place for this feature to be created. When merged, the `CHANGELOG.txt` will have the following new items: ``` Added `GraphFilter` to support filtering out vertices and edges that won't be touched by an OLAP job. Added `GraphComputer.vertices()` and `GraphComputer.edges()` for `GraphFilter` construction ( breaking ). `SparkGraphComputer`, `GiraphGraphComputer`, and `TinkerGraphComputer` all support `GraphFilter`. Added `GraphComputerTest.shouldSupportGraphFilter()` which verifies all filtered graphs have the same topology. Added `GraphFilterAware` interface to `hadoop-gremlin/` which tells the OLAP engine that the `InputFormat` handles filtering. `GryoInputFormat` and `ScriptInputFormat` all implement `GraphFilterAware`. Fixed a bug in `TraversalUtil.isLocalStarGraph()` which allowed certain illegal traversals to pass. Added `TraversalUtil.isLocalVertex()` to verify that the traversal does not touch incident edges. `GraphReader` IO interface now has `Optional<Vertex> readGraph(InputStream, GraphFilter)`. Default `UnsupportOperationException`. `GryoReader` does not materialize edges that will be filtered out and this greatly reduces GC and load times. Created custom `Serializers` for `SparkGraphComputer` message-passing classes which reduce graph sizes significantly. ``` Ran `mvn clean install` and integration tests. Passed. VOTE +1. You can merge this pull request into a Git repository by running: $ git pull https://github.com/apache/incubator-tinkerpop TINKERPOP-962 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/incubator-tinkerpop/pull/210.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #210 commit 873174e8218aef31f2220928ab16463aeda650cd Author: Marko A. Rodriguez <okrammarko@gmail.com> Date: 2016-02-01T16:29:14Z Started working on GraphComputer.vertices() and GraphComputer.edges(). Have it working (untested) for SparkGraphComputer. The same pattern will flow over to GiraphGraphComputer. There are some issues regarding semantics in TinkerGraphComputer. Will bring up with a [DISCUSS] . commit 3b3e008ce03d1f63610b92ff79886376d9dc55f7 Author: Marko A. Rodriguez <okrammarko@gmail.com> Date: 2016-02-01T19:38:42Z GraphComputerTest now verifies that graph filters work – GraphComputer.vertices() and GraphComputer.edges(). SparkGraphComputer implements graph filters correctly. TinkerGraph and Giraph throw UnsupportOperationException at this point (i.e. TODO). Had to add remove() methods to many of the inner Iterator anonymous classes in IteratorUtils and MultiIterator. Basically, they just call remove() on the wrapped iterator. Thus, cleanly backwards compatible. Added GraphFilterAware interface will allow InputFormats to say whether or not they do vertex/edge-filtering on graph load. Nothing connected to that yet, but GryoInputFormat (and smart providers) will be able to leverage this interface. Still a work in progress.... commit 3485d8454855938fd7c0c24d5c3f9c3eb6ab308a Author: Marko A. Rodriguez <okrammarko@gmail.com> Date: 2016-02-01T21:45:48Z Created a CommonFileInputFormat abstract class that both GryoInputFormat and ScriptInputFormat now extend. It handles all vertex/edge filter construction and has helper methods for filtering the StarVertex prior to being fully loaded by the InputFormat. This is really nice as we can now tweak vertex loading to a pretty intense degree especially with GryoInputFormat (e.g. once properties are loaded, check vertex filter and thus, don't even deserialize the edges). How it is right now, the full Vertex is materialized, then validated before the InputFormat will nextKeyValue(). commit 77732ddd5f60bbd65a445390e590da34bea1db2f Author: Marko A. Rodriguez <okrammarko@gmail.com> Date: 2016-02-01T21:59:04Z tweaks to filtered boolean check. commit 64c684065143b75697ccac755b9dfbf943c8c54c Author: Marko A. Rodriguez <okrammarko@gmail.com> Date: 2016-02-01T23:46:07Z GiraphGraphComputer now has support for vertexFilters and edgeFilters. Consolidated a bunch of code to make it easy for future InputFormats to be GraphFilterAware. Will most likely make a filterMap so variables are bundled nicely. commit bc417dbf01fee817aa325ee8e4b582fef8ab6788 Author: Marko A. Rodriguez <okrammarko@gmail.com> Date: 2016-02-02T00:44:19Z created a GraphFilter container object that makes storing and applying filters easy. Very clean model. GraphFilter will next contain stuff like inferences on the filters so easy push-down predicates are available to the graph system provider. commit d0ac65277702b703c1ab2257adcbf67b0699b959 Author: Marko A. Rodriguez <okrammarko@gmail.com> Date: 2016-02-02T15:55:14Z GraphFilter is now a really cool class. It is part of gremlin-core/computer and provides access to GraphComputer vertices() and edges() load filters. It also provides direct support for filtering StarVertex vertices (as most OLAP systems will leverage StarVertex). Its StarVertex support is nice in that GraphFilter analyzes the edgesFilter and can do bulk dropEdges() to prune the StarVertex fast. Whatever it can't do in bulk, it then runs the edgeFilter over the remaining edges. GraphComputerTest.shouldSupportGraphFilter() ensures that the graph is properly pruned. I have some ideas about pushing GraphFilter down to the StarVertex deserializer, but will need @spmallette help on that. If we can do that, then we can get some BLAZING speeds for highly pruned OLAP operations. commit eee16c9354602a49e7dfb7738f2ce4d9fe36152c Author: Marko A. Rodriguez <okrammarko@gmail.com> Date: 2016-02-02T19:17:59Z TinkerGraph now supports GraphComputer GraphFilter. Sort of an elegant solution that makes use of tagging elements that are legal or not. As of right now, the full test suite passes (integration too). GraphFilter works – this is going to be huge for speeding up OLAP times. commit 7ad48f20586ec58b1fea7018fa8f37ec8c95c9b9 Author: Marko A. Rodriguez <okrammarko@gmail.com> Date: 2016-02-02T19:44:57Z added a MapReduce test. We now verify that GraphFilter works for both VertexProgram+MapReduce and MapReduce only. TinkerGraph and Spark integration tests pass. commit 72e388c4a1eadb6654a422988857006ed27b6158 Author: Marko A. Rodriguez <okrammarko@gmail.com> Date: 2016-02-02T19:53:54Z added nice GraphFilter.legalVertex() and GraphFilter.legalEdges() methods so that the provider doesn't have to be smart about how to apply the underlying filter traversal. commit e4cf925b496ee250f7dca48d094e8b93816ca075 Author: Marko A. Rodriguez <okrammarko@gmail.com> Date: 2016-02-02T20:24:45Z Added a state-based test case to GraphFilter. About to run this thing on the Blade cluster against Friendster to see how well we do now. commit 7023987a0b5154646a4d77e0b2b3506e850ed3d2 Author: Marko A. Rodriguez <okrammarko@gmail.com> Date: 2016-02-02T20:35:27Z Forgot to add vertices() and edges() to the ComputerTraversalEngine.Builder. I can't wait for this model to go away in favor of a fluent TraversalSource. commit 6cfb1f22f43fa82be10d04fc28e86e8f3db9d28e Author: Marko A. Rodriguez <okrammarko@gmail.com> Date: 2016-02-02T21:24:38Z found a bug in TraversalUtil.isLocalStarGraph(). Added TraversalUtil.isLocalVertex() (for only checking properties – no edge access). Added JavaDoc to new GraphComputer methods. Added verfication that the provided traversals don't leave their respective boundaries. commit f7ad5c4f6a7b197cebb86fa22d4c263ce6b3365b Author: Marko A. Rodriguez <okrammarko@gmail.com> Date: 2016-02-02T21:49:22Z Added standard GraphComputer.Exceptions for GraphFilter and verfiy Exceptions are thrown correctly in GraphComputerTest. Tweaks to JavaDoc. commit b824d0c0994276e3714dc59341aa24526127eafe Author: Marko A. Rodriguez <okrammarko@gmail.com> Date: 2016-02-03T18:03:23Z Created specialized serializers for common classes in Spark to avoid the overhead of JavaSerialization. commit 4afe29a80fb15f965924297fafda942adeb36b06 Author: Marko A. Rodriguez <okrammarko@gmail.com> Date: 2016-02-03T18:09:46Z forgot a Serialization that popped up when taking things to the cluster. commit 1c9a31c4c3d3f09c829d135363ad7ebff6590c8d Author: Marko A. Rodriguez <okrammarko@gmail.com> Date: 2016-02-03T19:49:53Z Learned about ExternalizableSerializer which makes registration of Kryo serializers alot more simple. Ran this code on the cluster – what took 25 minutes now takes 6.8 minutes. commit 097e09a39a151e6dbb8ebb268bc1792baac8765a Author: Marko A. Rodriguez <okrammarko@gmail.com> Date: 2016-02-03T20:48:31Z minor nothings. commit 001a13dec5d3bb7ffa269fa2e392947d5c600a5e Author: Marko A. Rodriguez <okrammarko@gmail.com> Date: 2016-02-03T21:28:59Z Merge branch 'master' into TINKERPOP-962 commit 569496f671f4e532fc459cee54da3e6e62522ac1 Author: Marko A. Rodriguez <okrammarko@gmail.com> Date: 2016-02-04T14:36:31Z Merge branch 'master' into TINKERPOP-962 commit 07f7a8c614493de4bd13d2e75292609c5ee7183c Author: Marko A. Rodriguez <okrammarko@gmail.com> Date: 2016-02-04T17:42:24Z Moved GraphFilterTest to gremlin-groovy/ so I can use reflection and not have to make internal variables protected for testing purposes. Optional<Vertex> GraphReader.readVertex(InputStream,GraphFilter) now exists at the interface level with an UnsupportedOperationException default. GryoReader can now read vertices from a GraphFilter-perspective and only materialize those vertices/edges that are legal. Should be fairly trivial to add to GraphSONReader. commit b3d3116e5f287e61d44993af3e709c7d04bf77ac Author: Marko A. Rodriguez <okrammarko@gmail.com> Date: 2016-02-04T20:00:00Z was using null to represent a filtered vertex. went with Optional throughout so the API is consistent. commit a28b1fdc673bb6a11b741d306e3706efc4510592 Author: Marko A. Rodriguez <okrammarko@gmail.com> Date: 2016-02-04T20:23:12Z method rename. pointless twiddling. commit 25e5b24049ef22d1bb64ae652d6ff5cba4786451 Author: Marko A. Rodriguez <okrammarko@gmail.com> Date: 2016-02-04T20:51:12Z ensure that the context is closed after the test suite has completed. commit ed18cd9382ee1e2db7f4618a72e9d28ed6b2fb2a Author: Marko A. Rodriguez <okrammarko@gmail.com> Date: 2016-02-04T22:51:08Z OMG, the most insane bug for the last two hours. Painfull......
        Hide
        okram Marko A. Rodriguez added a comment -

        This has been tested on a blade cluster. g.V().count() now takes 6.8 minutes instead of 25 minutes on Friendster.

        Show
        okram Marko A. Rodriguez added a comment - This has been tested on a blade cluster. g.V().count() now takes 6.8 minutes instead of 25 minutes on Friendster.
        Hide
        okram Marko A. Rodriguez added a comment -

        So in the TINKERPOP-962 branch, I've gone with the edgeFilter being Traversal<Vertex,Edge>. It works really nice for StarVertex (Giraph and Spark) and I know it will work well for Titan because they will just compile that down to a "vertex query." However, I want to see how well it works for TinkerGraph before being committed.

        SIDENOTE: I'm thinking that the API should be. Though, I don't know how well the type inference will work – especially with Groovy.

        GraphComputer.filter(vertexFilter)
        GraphComputer.filter(edgeFilter)
        GraphComputer.filter(vertexFilter,edgeFilter) 
        
        Show
        okram Marko A. Rodriguez added a comment - So in the TINKERPOP-962 branch, I've gone with the edgeFilter being Traversal<Vertex,Edge> . It works really nice for StarVertex (Giraph and Spark) and I know it will work well for Titan because they will just compile that down to a "vertex query." However, I want to see how well it works for TinkerGraph before being committed. SIDENOTE: I'm thinking that the API should be. Though, I don't know how well the type inference will work – especially with Groovy. GraphComputer.filter(vertexFilter) GraphComputer.filter(edgeFilter) GraphComputer.filter(vertexFilter,edgeFilter)
        Hide
        okram Marko A. Rodriguez added a comment -

        This would be a lot easier/memory efficient if the submitted Traversal-filters could only analyze vertex properties/labels/ids for vertices() and edge properties/label/ids for edges(). Perhaps we make that a hard constraint? I'm already thing that for providers that want to use this, if the vertex filter is outE("know").count().is(gt(10)) then its basically a full graph load :|.

        Show
        okram Marko A. Rodriguez added a comment - This would be a lot easier/memory efficient if the submitted Traversal -filters could only analyze vertex properties/labels/ids for vertices() and edge properties/label/ids for edges() . Perhaps we make that a hard constraint? I'm already thing that for providers that want to use this, if the vertex filter is outE("know").count().is(gt(10)) then its basically a full graph load :|.
        Hide
        okram Marko A. Rodriguez added a comment -

        Ah. Forgot to Matthias Broecheler in the original ticket description. He wanted this and he might have some other ideas on the specification above.

        Show
        okram Marko A. Rodriguez added a comment - Ah. Forgot to Matthias Broecheler in the original ticket description. He wanted this and he might have some other ideas on the specification above.

          People

          • Assignee:
            okram Marko A. Rodriguez
            Reporter:
            okram Marko A. Rodriguez
          • Votes:
            1 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development