Uploaded image for project: 'TinkerPop'
  1. TinkerPop
  2. TINKERPOP-1166

Add Memory.reduce() as option to Memory implementations.

    Details

      Description

      Currently Memory supports incr, and, or, ... These are great and what people will typically use. However, we should also provide the generalization which is simply Memory.reduce. In this situation, incr, or, and, etc. are just specifications of Memory.reduce.

      How would it work?

      When memory is initialized in a VertexProgram, it would be like this:

      memory.set("myReduction", new MyReducingFunction(0))
      

      Then ReducingFunction would look like this:

      public class ReducingFunction implements UnaryOperator<A> {
      
        public A getInitialValue();
      
        public A apply(A first, A second);
      }
      

      Easy peasy. Note that both Spark and Giraph support such types of function-based reduction in their respective "memory engines." TinkerGraphComputer will, of course, be easy to add this functionality too.

      Why do this? For two reasons:

      1. We get extra flexibility in Memory.
      2. https://issues.apache.org/jira/browse/TINKERPOP-1164

        Issue Links

          Activity

          Hide
          okram Marko A. Rodriguez added a comment - - edited

          Jotted this out in a notebook and this feels the right way to do this.

          Memory.merge(String, Merge)
          
          public class SumMerge<T extends Number> implements Merge<T> {
            public SumMerge<T> merge(final SumMerge<T> other);
            public T get();
          }
          

          Both CountGlobalStep and SumGlobalStep would use the same SumMerge class. However, the merge for CountGlobalStep is just traverser.bulk(). For SumGlobalStep, its traverser.get() * traverser.bulk().

          Next, we can start to slide this into GraphComputer and start to push out MapReduce (maybe).... Check it:

          Lets say we have another interface called VertexMerge that extends Merge and adds this method:

          public Merge<T> initial(final Vertex vertex)
          
          graph.compute(SparkGraphComputer).program(MyVertexProgram).merge(MyMerge).merge(...).merge(...)
          

          The GraphComputer.merge(VertexMerge) simply gets its initial value by first processing the current Vertex. Also, it can access the edges of the vertex! – which is something our current MapReduce model doesn't support! Thats it. At that point, this identical to MapReduce EXCEPT! that in MapReduce if you ONLY do a Map, with no Reduce, you still have output splits distributed across the cluster, where in this model, that would be VERY BAD to do without some filtering of some sort or else you will merge a massive list to a single machine.

          This is all very simple to do and I believe is easier to grock than the MapReduce-extension we added because its all part of the VertexProgram execution and not some auxiliary appendage.

          Show
          okram Marko A. Rodriguez added a comment - - edited Jotted this out in a notebook and this feels the right way to do this. Memory.merge( String , Merge) public class SumMerge<T extends Number > implements Merge<T> { public SumMerge<T> merge( final SumMerge<T> other); public T get(); } Both CountGlobalStep and SumGlobalStep would use the same SumMerge class. However, the merge for CountGlobalStep is just traverser.bulk() . For SumGlobalStep , its traverser.get() * traverser.bulk() . Next, we can start to slide this into GraphComputer and start to push out MapReduce (maybe).... Check it: Lets say we have another interface called VertexMerge that extends Merge and adds this method: public Merge<T> initial( final Vertex vertex) graph.compute(SparkGraphComputer).program(MyVertexProgram).merge(MyMerge).merge(...).merge(...) The GraphComputer.merge(VertexMerge) simply gets its initial value by first processing the current Vertex. Also, it can access the edges of the vertex! – which is something our current MapReduce model doesn't support! Thats it. At that point, this identical to MapReduce EXCEPT! that in MapReduce if you ONLY do a Map, with no Reduce, you still have output splits distributed across the cluster, where in this model, that would be VERY BAD to do without some filtering of some sort or else you will merge a massive list to a single machine. This is all very simple to do and I believe is easier to grock than the MapReduce -extension we added because its all part of the VertexProgram execution and not some auxiliary appendage.
          Hide
          okram Marko A. Rodriguez added a comment - - edited

          More thinking on this. I think we might need to make this a breaking change for both providers and users. Its not too brutal though.

          VertexProgram.getComputeKeys() -> Set<ComputeKey>
          VertexProgram.getMemoryKeys() -> Set<MemoryKey>
          ComputeKey [String key, Persistence.STATIC/TRANSIENT]
          MemoryKey [String key, UnaryOperator reducer, Persistence.STATIC/TRANSIENT]
          

          This will solve TINKERPOP-1162 too.

          Next, in VertexProgram.setup(Memory memory) user will do:

          memory.register("vote.to.halt", true, {a,b -> a || b}) // we can have OrOperator.instance() all nice and static.
          memory.register("globalCount", 0, {a,b -> a + b}) 
          

          Then, when someone wants to update a memory item in the VertexProgram.execute()-method, they do:

          memory.add("vote.to.halt", false)
          memory.add("globalCount", 134345)
          

          Finally, in the VertexProgram.terminate(Memory memory) method, memory can be hard set (like how it is now).

          memory.set("vote.to.halt",true)
          memory.set("globalCount",0)
          

          There is nothing "crazy here," its just an API change. Why this won't be toooooo bad.

          • Users typically don't write VertexPrograms so I bet not many exist. And if they do, they are built by "power users" who will appreciate why this is better.
          • GraphComputer providers are few and far between (seems most just use SparkGraphComputer). However, for those that do exist, the code is still "the same", its primarily an API rework. Not tooooooooo crazy, but yea, they will have to do something.

          The benefits:

          1. We get transient keys finally settled. This is huge for data reduction. (see TINKERPOP-1162)
          2. We are on our way to gutting the MapReduce infrastructure. (perhaps a Deprecation in 3.3.0)
          3. We get super efficient/fast ReducingBarrierStep OLAP implementations. (see TINKERPOP-1164)

          To add to (3), its crazy cause now Traversal/SideEffects and GraphComputer/Memory are in one-to-one correspondence. Its very elegant.

          Show
          okram Marko A. Rodriguez added a comment - - edited More thinking on this. I think we might need to make this a breaking change for both providers and users. Its not too brutal though. VertexProgram.getComputeKeys() -> Set<ComputeKey> VertexProgram.getMemoryKeys() -> Set<MemoryKey> ComputeKey [ String key, Persistence.STATIC/TRANSIENT] MemoryKey [ String key, UnaryOperator reducer, Persistence.STATIC/TRANSIENT] This will solve TINKERPOP-1162 too. Next, in VertexProgram.setup(Memory memory) user will do: memory.register( "vote.to.halt" , true , {a,b -> a || b}) // we can have OrOperator.instance() all nice and static . memory.register( "globalCount" , 0, {a,b -> a + b}) Then, when someone wants to update a memory item in the VertexProgram.execute() -method, they do: memory.add( "vote.to.halt" , false ) memory.add( "globalCount" , 134345) Finally, in the VertexProgram.terminate(Memory memory) method, memory can be hard set (like how it is now). memory.set( "vote.to.halt" , true ) memory.set( "globalCount" ,0) There is nothing "crazy here," its just an API change. Why this won't be toooooo bad. Users typically don't write VertexPrograms so I bet not many exist. And if they do, they are built by "power users" who will appreciate why this is better. GraphComputer providers are few and far between (seems most just use SparkGraphComputer). However, for those that do exist, the code is still "the same", its primarily an API rework. Not tooooooooo crazy, but yea, they will have to do something. The benefits: 1. We get transient keys finally settled. This is huge for data reduction. (see TINKERPOP-1162 ) 2. We are on our way to gutting the MapReduce infrastructure. (perhaps a Deprecation in 3.3.0) 3. We get super efficient/fast ReducingBarrierStep OLAP implementations. (see TINKERPOP-1164 ) To add to (3), its crazy cause now Traversal/SideEffects and GraphComputer/Memory are in one-to-one correspondence. Its very elegant.
          Hide
          okram Marko A. Rodriguez added a comment -

          The new Reducing model took the Friendster g.V().count() from 6.8 minutes to 5.6 minutes w/ 3.4 of those minutes spent loading the graph (ScriptInputFormat). The gain is because count(), group(), groupCount(), min(), max(), etc. no longer have subsequent MapReduce jobs and thus, one less scan of the full graph. Another benefit of this model is that there are "parallel reductions." groupCount("m")....count() is not two subsequent MapReduce jobs, but embedded in the VertexProgram execution.

          Show
          okram Marko A. Rodriguez added a comment - The new Reducing model took the Friendster g.V().count() from 6.8 minutes to 5.6 minutes w/ 3.4 of those minutes spent loading the graph (ScriptInputFormat). The gain is because count(), group(), groupCount(), min(), max(), etc. no longer have subsequent MapReduce jobs and thus, one less scan of the full graph. Another benefit of this model is that there are "parallel reductions." groupCount("m")....count() is not two subsequent MapReduce jobs, but embedded in the VertexProgram execution.
          Hide
          githubbot ASF GitHub Bot added a comment -

          GitHub user okram opened a pull request:

          https://github.com/apache/incubator-tinkerpop/pull/243

          TINKERPOP-1166, TINKERPOP-1164, TINKERPOP-1057, TINKERPOP-1162

          https://issues.apache.org/jira/browse/TINKERPOP-1166
          https://issues.apache.org/jira/browse/TINKERPOP-1164
          https://issues.apache.org/jira/browse/TINKERPOP-1057
          https://issues.apache.org/jira/browse/TINKERPOP-1162

          This PR has all these tickets combined because in order to solve any I needed to make sure all these ticket's respective solutions played well with one another. In summary, GraphComputer `Memory` has come to the center stage as a replacement for `MapReduce`. The benefit is that `Memory` can compute/reduce in parallel with a VertexProgram's execution run. This has the following benefits:

          • Reductions happen within the vertex program and thus, no subsequent rescan of the full graph needed to generate sideEffect. For example – `g.V().count()` is one pass through the graph, not two.
          • Because reductions happen within the vertex program, barrier steps are no longer the cut-off point for an OLAP traversal. An OLAP traversal can now have multiple reducing barrier steps within it (e.g. `max()`, `groupCount()`, `min()`, `group()`, etc.). Its all one job.
          • Vertex compute keys can be marked transient and are automagically removed from the resultant graph. This is good for removing "scratch data." (e.g. PageRankVertexProgram.EDGE_COUNTS).
          • Memory compute keys can be marked transient and are automagically remove from the result memory. This is good for removing "scratch data." (e.g. VOTE_TO_HALT).
          • Memory compute keys can be declared to NOT broadcast and thus, no be sent to the workers on each iteration. Workers can still send data, but just can not read data.

          Finally, one of the major side-effect benefits of this work is that numerous traversals that were considered "illegal" by `ComputerVerificationStrategy` are no longer illegal. The only types of traversals that are illegal in OLAP are those that have `by()`-modulators that go beyond the local star graph or are path-based (`select()` and `path()`) and go beyond the element id in their `by()`-modulations.

          This work creates breaking changes for both users (trivial) and providers (intense). However, for providers, its only those providers that have their own custom `GraphComputer` implementation. If they use `SparkGraphComputer` or `GiraphGraphComputer`, no work is required of them.

          CHANGELOG

          ```

          • Added `MemoryComputeKey` for specification of `Memory` keys in `VertexProgram`. (breaking)
          • Added `VertexComputeKey` for specification of vertex compute properties in `VertexProgram`. (breaking)
          • Added `and`, `or`, and `addAll` to `Operator`.
          • `Memory` API changed to support setting and adding values for reduction. (breaking)
          • `Memory` keys can be marked as broadcast and only those values are sent to workers on each iterator.
          • `Memory` keys can be marked transient and thus deleted at the end of the OLAP job.
          • Vertex compute keys can be marked transient and thus deleted at the end of the OLAP job.
          • `VertexProgram` API changed to support `MemoryComputeKey` and `VertexComputeKey`. (breaking)
          • `TraversalVertexProgram` able to execute OLAP and OLTP traversal sections dynamically within the same job.
          • Removed `FinalGet` interface as all post processing of reductions should be handled by the reducing step explicitly. (breaking)
          • Greatly simplified all `ReducingBarrierStep` implementations as they no longer require `MapReduce` in OLAP.
          • Nearly all steps in OLAP that used `MapReduce` now use `Memory` to do their reductions which expands the list of legal traversals.
          • `GroupStep` simplified with `GroupHelper.GroupMap` no longer being needed. Related to the removal of `FinalGet`.
          • OLAP side-effects that are no longer generated by `MapReduce` are simply stored in `ComputerResult.Memory` w/ no disk persistence needed. (breaking)
            ```

          UPGRADE-PROVIDERS

          ```
          GraphComputer updates to semantics and API
          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

          Providers that have a custom `GraphComputer` implementation will have a lot to handle. Note that if the graph system simply uses `SparkGraphComputer` or `GiraphGraphComputer` provided by TinkerPop, then no updates are required.

          `Memory` updates:

          • Any `BinaryOperator` can be used for reduction and is made explicit in the `MemoryComputeKey`.
          • `MemoryComputeKeys` can be marked transient and must be removed from the resultant `ComputerResult.memory()`.
          • `MemoryComputeKeys` can be specified to not broadcast and thus, must not be available to workers to read in `VertexProgram.execute()`.
          • The `Memory` API has been changed. No more `incr()`, `and()`, etc. Now its just `set()` (setup/terminate) and `add()` (execute).

          See TINKERPOP-1166:https://issues.apache.org/jira/browse/TINKERPOP-1166

          Operational semantic test cases have been added to `GraphComputerTest` to ensure that all the above behaviors are implemented correctly.

          Providers that have a custom `ReducingBarrierStep` implementation will need to adjust their implementation slightly to accommodate a new API that reflects the `Memory` updates above. This should be a simple change. Note that `FinalGet` no longer exists and such post-reduction processing is handled by the reducing step.

          See TINKERPOP-1164:https://issues.apache.org/jira/browse/TINKERPOP-1164

          ```

          UPGRADE-USERS

          ```
          VertexProgram and MemoryComputeKey and VertexComputeKey
          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

          Users that have custom `VertexProgram` implementations will need to change their implementations to support the new `VertexComputeKey` and `MemoryComputeKey` classes. In the `VertexPrograms` provided by TinkerPop, these changes were trivial, taking less than 5 minutes to make the updates.

          • `VertexProgram.getVertexComputeKeys()` returns a `Set<VertexComputeKey>`. No longer a `Set<String>`. Use `VertexComputeKey.of(String key,boolean transient)` to generate a `VertexComputeKey`. Transient keys were not supported in the past, so to make the implementation semantically equivalent, the boolean transient should be false.
          • `VertexProgram.getMemoryComputeKeys()` returns a `Set<MemoryComputeKey>`. No longer a `Set<String>`. Use `MemoryComputeKey.of(String key, BinaryOperator reducer, boolean broadcast, boolean transient)` to generate a `MemoryComputeKey`. Broadcasting and transients were not supported in the past so to make the implementation semantically equivalent, to boolean broadcast should be true and the boolean transient should be false.

          See TINKERPOP-1162:https://issues.apache.org/jira/browse/TINKERPOP-1162

          SparkGraphComputer and GiraphGraphComputer Persistence
          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

          Most of the `MapReduce`-based steps in `TraversalVertexProgram` have been removed and replaced using the new `Memory`-reduction model. `MapReduce` jobs always created a persistence footprint (e.g. in HDFS). Memory data was never persisted to HDFS. As such, there will not be data on the disk that is accessible. For instance, no more `~reducing`, `~traversers`, and specially named side-effects such as those from `groupCount('m')`. The data is still accessible via `ComputerResult.memory()`, its just simply does not have a corresponding on-disk representation.
          ```

          CHANGELOG

          ```

          • Added `MemoryComputeKey` for specification of `Memory` keys in `VertexProgram`. (breaking)
          • Added `VertexComputeKey` for specification of vertex compute properties in `VertexProgram`. (breaking)
          • Added `and`, `or`, and `addAll` to `Operator`.
          • `Memory` API changed to support setting and adding values for reduction. (breaking)
          • `Memory` keys can be marked as broadcast and only those values are sent to workers on each iterator.
          • `Memory` keys can be marked transient and thus deleted at the end of the OLAP job.
          • Vertex compute keys can be marked transient and thus deleted at the end of the OLAP job.
          • `VertexProgram` API changed to support `MemoryComputeKey` and `VertexComputeKey`. (breaking)
          • `TraversalVertexProgram` able to execute OLAP and OLTP traversal sections dynamically within the same job.
          • Removed `FinalGet` interface as all post processing of reductions should be handled by the reducing step explicitly. (breaking)
          • Greatly simplified all `ReducingBarrierStep` implementations as they no longer require `MapReduce` in OLAP.
          • Nearly all steps in OLAP that used `MapReduce` now use `Memory` to do their reductions which expands the list of legal traversals.
          • `GroupStep` simplified with `GroupHelper.GroupMap` no longer being needed. Related to the removal of `FinalGet`.
          • OLAP side-effects that are no longer generated by `MapReduce` are simply stored in `ComputerResult.Memory` w/ no disk persistence needed. (breaking)
            ```

          You can merge this pull request into a Git repository by running:

          $ git pull https://github.com/apache/incubator-tinkerpop TINKERPOP-1166

          Alternatively you can review and apply these changes as the patch at:

          https://github.com/apache/incubator-tinkerpop/pull/243.patch

          To close this pull request, make a commit to your master/trunk branch
          with (at least) the following in the commit message:

          This closes #243


          commit b50a43ce781572a1610fa3e31b5132205796af67
          Author: Marko A. Rodriguez <okrammarko@gmail.com>
          Date: 2016-02-19T22:48:50Z

          Migrated over to the proposed Memory model of using registered BinaryOperator reducers. It was really easy to change so thats good. All test cases pass for TinkerGraphComputer, one fails in SparkGraphComputer, and I have some NullPointer serialiation issue with GiraphGraphComputer that I will fix later.

          commit 0ae584cae51ab15eef7de776cf8c049b64ace852
          Author: Marko A. Rodriguez <okrammarko@gmail.com>
          Date: 2016-02-19T23:45:18Z

          couldn't help myself. once I walked away from the computer I realized how to make GiraphMemory work. Got it working and test cases passing. Only one test fails in SparkGraphComputer. I will handle THAT next week.

          commit cf79c2255028d3a2a7dabb4198030b50bfb65417
          Author: Marko A. Rodriguez <okrammarko@gmail.com>
          Date: 2016-02-20T02:35:52Z

          minor tweak.

          commit 7bf8213a5e27026f9a378a0eb166f3a67038321f
          Author: Marko A. Rodriguez <okrammarko@gmail.com>
          Date: 2016-02-20T17:03:25Z

          fixed the SparkMemory issue from last night. Wow. I was really expecting this ticket to take me all of next week. Knocked it out before the week even began. Bow to me all you peons.

          commit 706759c1dbf9df6bf9210913001658ca9f9ff513
          Author: Marko A. Rodriguez <okrammarko@gmail.com>
          Date: 2016-02-22T16:50:06Z

          added GraphComputerTest.shouldSupportTransientKeys(). Ensured that only Memory.set() is allowed in setup()/terminate() and Memory.add() in execute(). Fixed up SparkGraphComputer, TinkerGraphComputer, and GiraphSparkComputer to respect the new MemoryComputeKey semantics and transient key semantics.

          commit b03c1adc466fa48fffcfeb7e17b71a243f66b76f
          Author: Marko A. Rodriguez <okrammarko@gmail.com>
          Date: 2016-02-22T17:38:50Z

          EDGE_COUNT and VOTE_STRENGTH in PageRankVertexProgram and PeerPressureVertexProgram are not transient and the respective property keys are private static. Extended GraphComputerTest.shouldSupportTransientKeys() with a MapReduce job that ensure that the transient vertex properties are not accessible during MapReduce.

          commit 65bbdaa336b4034421f7e2599bb3f2c307aa4773
          Author: Marko A. Rodriguez <okrammarko@gmail.com>
          Date: 2016-02-22T21:41:02Z

          Have all the ReducingBarrierSteps no longer implement MapReduce and using MemoryComputeKeys for their reduction. GroupStep, FoldStep, and GroupStepV3d0 are not complete yet. Having the darndest time with GroupStep – once I get it, then the others will follow from it. Pushing to save work.

          commit 3389122591a788ceb566f9bdf5a1ba6b72789faa
          Author: Marko A. Rodriguez <okrammarko@gmail.com>
          Date: 2016-02-22T22:15:21Z

          Got GroupStep working — buts its a hack unfortunately. Pushing to save work.

          commit 9f8252b816432249a0667e654b682293b77ec3c1
          Author: Marko A. Rodriguez <okrammarko@gmail.com>
          Date: 2016-02-23T00:21:48Z

          Fixed a bug in FoldStep, got GroupStep working perfectly (both OLTP and OLAP – but there is still one awakward hack). Need to spend some more cycles on GroupStep and then once I get that clean, clean, clean, I will map that pattern over to GroupStepV3d0 and that will be that.

          commit 90debb4fed3bffa5833543566620bcf0a30ae7f4
          Author: Marko A. Rodriguez <okrammarko@gmail.com>
          Date: 2016-02-23T01:15:09Z

          some more work on GroupStep ---- converging on final solution.

          commit 9416e9498a363e8ca1ef69df6cc0d046762e43e3
          Author: Marko A. Rodriguez <okrammarko@gmail.com>
          Date: 2016-02-23T17:25:36Z

          New Reducing-based Memory-model implemented. A few kooky things emerged because of this and will discuss in UPGRADE docs. However, all-in-all this is a much nicer model which will lead to significant perofmrance improvements (still need to benchmark and test). I don't think we will ever deprecate the MapReduce infrastructure. The Memory model is not as flexible and efficient (for certain jobs) as MapReduce.

          commit d77bf58c8e60f52f180af21ef18091c3cb143c3b
          Author: Marko A. Rodriguez <okrammarko@gmail.com>
          Date: 2016-02-23T20:40:04Z

          Solved the GroupStep-hack. The whole notion of a FinalGet is bad. Every ReducingBarrier step now implements generateFinalReduction() with the default being the identity function. For GroupStep, MeanStep, and GroupStepV3d0, they do the final respective reduction. Also, GroupStep is a bit more organized – only one BinaryOperator needed.

          commit 14d0d48435640572523d942d56b9a7a6dd128969
          Author: Marko A. Rodriguez <okrammarko@gmail.com>
          Date: 2016-02-23T23:21:02Z

          The next big thing in the new MemoryComputeKey model – broadcasting. It is possible to state that a MemoryKey will NOT be read by workers and thus, no need to send the data to the workers on each iteration. Added GraphComputerTest.shouldSupportBroadcastKeys(). SparkGraphComputer supports this natively, TinkerGraphComputer simply hides the data when trying to be accessed by workers, and GiraphGraphComputer (like TinkerGraphComputer) but I will be able to at least clear the data immediately once its sent (future work). Cleaned up GroupStep a bit. Have a consistent naming convention for workers vs. master — inExecute. All XXXMemory implementations use it so its easier to see how they all relate to each other. Added a GraphComputerTest to make sure exceptions are correct around get(), add(), set() for various situations – found a couple of inconsistencies that are now fixed up.

          commit 90e862d76ee23d8e03f1b954be4276cd09f707ed
          Author: Marko A. Rodriguez <okrammarko@gmail.com>
          Date: 2016-02-23T23:21:35Z

          Merge branch 'master' into TINKERPOP-1166

          commit e6adbecfc401c4f41705216fecfac74ff04b5c8e
          Author: Marko A. Rodriguez <okrammarko@gmail.com>
          Date: 2016-02-24T00:37:23Z

          Needed a no-args constructor for FoldBiOperator in GiraphGraphComputer.

          commit aa79390f7afd78091755784753ed67a311f1d7da
          Author: Marko A. Rodriguez <okrammarko@gmail.com>
          Date: 2016-02-24T20:52:29Z

          Okay. So MapReduce in TraversalVertexProgram is going away fast. GroupSideEffectStep and GroupStep are now one step – GroupStep. Likewise for GroupCountStep. groupCount() is simply groupCount().cap(). This makes the code alot simpler and easier to optimize everything in one spot. With this direction, TraversalVertexProgram will be able to do OLAP ... then in terminate(), do reductions. If those reductions yield traversers, termiate() == false, and we distributed messages again. Thus, we will have OLTP->OLAP->OLTP->OLAP all possible within in a single TraveraslVertexProgram. The idea is that the master worker (termiante()) will do OLTP processing until it needs to go back to OLAP (if necessary). There is still lots more work to do. This push is to save work.

          commit 980525cd2d7ee656858f68703635972c740486fa
          Author: Marko A. Rodriguez <okrammarko@gmail.com>
          Date: 2016-02-24T22:13:09Z

          StoreStep no longer uses MapReduce. All that is left is AggregateStep and ProfileStep.

          commit 2f3b2ac971ade400b4f2bfb794b451a612ed0f69
          Author: Marko A. Rodriguez <okrammarko@gmail.com>
          Date: 2016-02-25T15:29:23Z

          out of my 24 hole. @dkuppitz – I was wrong, you can't have XXXStep and XXXSideEffectStep as one in the same. I learned exactly what I learned about a year ago by trying. However, note that all GroupCountXXXStep and GroupXXXStep are no longer MapReduce-based, but GraphComputer.Memory-based and because of their alignment they share lots of the same code. All that is left is TreeStep, ProfileStep, and AggregateStep to convert to the GraphComputer.Memory model.

          commit aae8910388873f7a749b3df330fca05ce7b5d2eb
          Author: Marko A. Rodriguez <okrammarko@gmail.com>
          Date: 2016-02-25T15:53:09Z

          Got rid of the FinalGet concept. This won't work moving forward. Each step is responsible for its final reduction via GraphComputing.generateFinalReduction(). TreeStep and TreeSideEffectStep now both use the GraphComputer.Memory model. All that is left is AggregateStep and ProfileStep.

          commit d1812f806cd360ce5716a620413a28938ef80ab6
          Author: Marko A. Rodriguez <okrammarko@gmail.com>
          Date: 2016-02-25T17:40:40Z

          RangeGlobalStep is now computing in TraversalVertexProgram. Will slowly dissect away the end-steps of TraverserMapReduce such that we have a pure OLTP/OLAP model within TraversalVertexProgram.

          commit e6d9d6d2581dd3ef4c32d336df0eb7a8989bbaba
          Author: Marko A. Rodriguez <okrammarko@gmail.com>
          Date: 2016-02-25T17:53:18Z

          TailGlobalStep is no longer an OLAP end step. CollectingBarriers are up next – Aggregate, Order, Dedup.

          commit c4b9e741421d2af895bdf3c37166f952209d72d4
          Author: Marko A. Rodriguez <okrammarko@gmail.com>
          Date: 2016-02-25T18:37:44Z

          All the ReducingBarrierSteps inherent their MemoryComputeKey from abstract ReducingBarrierStep. There key is now step.getId(), not longer ReducingBarrierStep.REDUCING as you can now have multiple reducers in the same OLAP job. Updated ComputerResultStep accordingly. Next up will be such that ComputerResultStep does not do ANY introspection into TraversalVertexProgramStep – it simply pulls ~traversers from GraphComputer.Memory. And thats all there is to it.

          commit 6950b198aff0d1fb4842d972bc145a6eeae539b2
          Author: Marko A. Rodriguez <okrammarko@gmail.com>
          Date: 2016-02-25T21:46:23Z

          Gremlin OLAP can now do all the same traversals as Gremlin OLTP. In fact, a Gremlin OLAP job is a undulation between distributed traversers and localized traversers. Its really neat – it like 'breathes'. Fan out, reduce, fan out, reduce.... I still have to convert over OrderGlobalStep so its not done done, but yea. Saving work.

          commit 32dd068291e45179648340745c882df459a8115a
          Author: Marko A. Rodriguez <okrammarko@gmail.com>
          Date: 2016-02-26T00:04:55Z

          Okay. Here is the mother load. OrderGlobalStep, DedupGlobalStep, etc. can now exist anywhere in an OLAP traversal. There are some loose ends that still need to be cleanup (as well as some major code reorg and compression), but this is the stuff. This has been a long time coming. This new GraphComputer.Memory model is sooooo much more efficient and gives us much more expressivity. Stoked.

          commit 066cc218bb90a27f1ee016b4db3016a630f00f63
          Author: Marko A. Rodriguez <okrammarko@gmail.com>
          Date: 2016-02-26T01:17:16Z

          limit(1) is now compiled into the TraversalVertexProgramStep job. Forgot to update this test.

          commit 42b7254505563eaa1925639446cbb55ad708c7b7
          Author: Marko A. Rodriguez <okrammarko@gmail.com>
          Date: 2016-02-26T15:45:14Z

          I totally forgot about @dkuppitz Operator.java work. I gutted lots of rewritten operators and now just uses Operator. This makes less classes to register with GryoMapper – phew. It was getting insane. Also, in the future, if we want to make more optimal implementations, we can just add stuff like Operator.sumLong() and it doesn't effect serialization because Operator is an enum.

          commit efc1739f183ff8f30be54b53c393e3ba8d03ced9
          Author: Marko A. Rodriguez <okrammarko@gmail.com>
          Date: 2016-02-26T16:33:33Z

          tada – fixed my groupCount()...groupCount()... bug. Added a crazy asynchronous traversal to groupCount() that demonstrates that the timing is correct for OLAP to OLTP to OLAP conversion. I went through all the test case that have Ignore over a test and remove lots of them. Some didnt even make sense why we had them Ignored. There are only a few Ignore(COMPUTER) and they make sense... (not related to this PR, but in general issues with serialization or whatnot). This new OLAP work is sooooo slammin.

          commit 958791792b3e48b7c983fb5a54ae21671e1154f6
          Author: Marko A. Rodriguez <okrammarko@gmail.com>
          Date: 2016-02-26T17:11:38Z

          renamed Operator.add() to Operator.addAll() as its more clear especially since Operator.sum() exists.


          Show
          githubbot ASF GitHub Bot added a comment - GitHub user okram opened a pull request: https://github.com/apache/incubator-tinkerpop/pull/243 TINKERPOP-1166 , TINKERPOP-1164 , TINKERPOP-1057 , TINKERPOP-1162 https://issues.apache.org/jira/browse/TINKERPOP-1166 https://issues.apache.org/jira/browse/TINKERPOP-1164 https://issues.apache.org/jira/browse/TINKERPOP-1057 https://issues.apache.org/jira/browse/TINKERPOP-1162 This PR has all these tickets combined because in order to solve any I needed to make sure all these ticket's respective solutions played well with one another. In summary, GraphComputer `Memory` has come to the center stage as a replacement for `MapReduce`. The benefit is that `Memory` can compute/reduce in parallel with a VertexProgram's execution run. This has the following benefits: Reductions happen within the vertex program and thus, no subsequent rescan of the full graph needed to generate sideEffect. For example – `g.V().count()` is one pass through the graph, not two. Because reductions happen within the vertex program, barrier steps are no longer the cut-off point for an OLAP traversal. An OLAP traversal can now have multiple reducing barrier steps within it (e.g. `max()`, `groupCount()`, `min()`, `group()`, etc.). Its all one job. Vertex compute keys can be marked transient and are automagically removed from the resultant graph. This is good for removing "scratch data." (e.g. PageRankVertexProgram.EDGE_COUNTS). Memory compute keys can be marked transient and are automagically remove from the result memory. This is good for removing "scratch data." (e.g. VOTE_TO_HALT). Memory compute keys can be declared to NOT broadcast and thus, no be sent to the workers on each iteration. Workers can still send data, but just can not read data. Finally, one of the major side-effect benefits of this work is that numerous traversals that were considered "illegal" by `ComputerVerificationStrategy` are no longer illegal. The only types of traversals that are illegal in OLAP are those that have `by()`-modulators that go beyond the local star graph or are path-based (`select()` and `path()`) and go beyond the element id in their `by()`-modulations. This work creates breaking changes for both users (trivial) and providers (intense). However, for providers, its only those providers that have their own custom `GraphComputer` implementation. If they use `SparkGraphComputer` or `GiraphGraphComputer`, no work is required of them. CHANGELOG ``` Added `MemoryComputeKey` for specification of `Memory` keys in `VertexProgram`. ( breaking ) Added `VertexComputeKey` for specification of vertex compute properties in `VertexProgram`. ( breaking ) Added `and`, `or`, and `addAll` to `Operator`. `Memory` API changed to support setting and adding values for reduction. ( breaking ) `Memory` keys can be marked as broadcast and only those values are sent to workers on each iterator. `Memory` keys can be marked transient and thus deleted at the end of the OLAP job. Vertex compute keys can be marked transient and thus deleted at the end of the OLAP job. `VertexProgram` API changed to support `MemoryComputeKey` and `VertexComputeKey`. ( breaking ) `TraversalVertexProgram` able to execute OLAP and OLTP traversal sections dynamically within the same job. Removed `FinalGet` interface as all post processing of reductions should be handled by the reducing step explicitly. ( breaking ) Greatly simplified all `ReducingBarrierStep` implementations as they no longer require `MapReduce` in OLAP. Nearly all steps in OLAP that used `MapReduce` now use `Memory` to do their reductions which expands the list of legal traversals. `GroupStep` simplified with `GroupHelper.GroupMap` no longer being needed. Related to the removal of `FinalGet`. OLAP side-effects that are no longer generated by `MapReduce` are simply stored in `ComputerResult.Memory` w/ no disk persistence needed. ( breaking ) ``` UPGRADE-PROVIDERS ``` GraphComputer updates to semantics and API ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ Providers that have a custom `GraphComputer` implementation will have a lot to handle. Note that if the graph system simply uses `SparkGraphComputer` or `GiraphGraphComputer` provided by TinkerPop, then no updates are required. `Memory` updates: Any `BinaryOperator` can be used for reduction and is made explicit in the `MemoryComputeKey`. `MemoryComputeKeys` can be marked transient and must be removed from the resultant `ComputerResult.memory()`. `MemoryComputeKeys` can be specified to not broadcast and thus, must not be available to workers to read in `VertexProgram.execute()`. The `Memory` API has been changed. No more `incr()`, `and()`, etc. Now its just `set()` (setup/terminate) and `add()` (execute). See TINKERPOP-1166 : https://issues.apache.org/jira/browse/TINKERPOP-1166 Operational semantic test cases have been added to `GraphComputerTest` to ensure that all the above behaviors are implemented correctly. Providers that have a custom `ReducingBarrierStep` implementation will need to adjust their implementation slightly to accommodate a new API that reflects the `Memory` updates above. This should be a simple change. Note that `FinalGet` no longer exists and such post-reduction processing is handled by the reducing step. See TINKERPOP-1164 : https://issues.apache.org/jira/browse/TINKERPOP-1164 ``` UPGRADE-USERS ``` VertexProgram and MemoryComputeKey and VertexComputeKey ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ Users that have custom `VertexProgram` implementations will need to change their implementations to support the new `VertexComputeKey` and `MemoryComputeKey` classes. In the `VertexPrograms` provided by TinkerPop, these changes were trivial, taking less than 5 minutes to make the updates. `VertexProgram.getVertexComputeKeys()` returns a `Set<VertexComputeKey>`. No longer a `Set<String>`. Use `VertexComputeKey.of(String key,boolean transient)` to generate a `VertexComputeKey`. Transient keys were not supported in the past, so to make the implementation semantically equivalent, the boolean transient should be false. `VertexProgram.getMemoryComputeKeys()` returns a `Set<MemoryComputeKey>`. No longer a `Set<String>`. Use `MemoryComputeKey.of(String key, BinaryOperator reducer, boolean broadcast, boolean transient)` to generate a `MemoryComputeKey`. Broadcasting and transients were not supported in the past so to make the implementation semantically equivalent, to boolean broadcast should be true and the boolean transient should be false. See TINKERPOP-1162 : https://issues.apache.org/jira/browse/TINKERPOP-1162 SparkGraphComputer and GiraphGraphComputer Persistence ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ Most of the `MapReduce`-based steps in `TraversalVertexProgram` have been removed and replaced using the new `Memory`-reduction model. `MapReduce` jobs always created a persistence footprint (e.g. in HDFS). Memory data was never persisted to HDFS. As such, there will not be data on the disk that is accessible. For instance, no more `~reducing`, `~traversers`, and specially named side-effects such as those from `groupCount('m')`. The data is still accessible via `ComputerResult.memory()`, its just simply does not have a corresponding on-disk representation. ``` CHANGELOG ``` Added `MemoryComputeKey` for specification of `Memory` keys in `VertexProgram`. ( breaking ) Added `VertexComputeKey` for specification of vertex compute properties in `VertexProgram`. ( breaking ) Added `and`, `or`, and `addAll` to `Operator`. `Memory` API changed to support setting and adding values for reduction. ( breaking ) `Memory` keys can be marked as broadcast and only those values are sent to workers on each iterator. `Memory` keys can be marked transient and thus deleted at the end of the OLAP job. Vertex compute keys can be marked transient and thus deleted at the end of the OLAP job. `VertexProgram` API changed to support `MemoryComputeKey` and `VertexComputeKey`. ( breaking ) `TraversalVertexProgram` able to execute OLAP and OLTP traversal sections dynamically within the same job. Removed `FinalGet` interface as all post processing of reductions should be handled by the reducing step explicitly. ( breaking ) Greatly simplified all `ReducingBarrierStep` implementations as they no longer require `MapReduce` in OLAP. Nearly all steps in OLAP that used `MapReduce` now use `Memory` to do their reductions which expands the list of legal traversals. `GroupStep` simplified with `GroupHelper.GroupMap` no longer being needed. Related to the removal of `FinalGet`. OLAP side-effects that are no longer generated by `MapReduce` are simply stored in `ComputerResult.Memory` w/ no disk persistence needed. ( breaking ) ``` You can merge this pull request into a Git repository by running: $ git pull https://github.com/apache/incubator-tinkerpop TINKERPOP-1166 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/incubator-tinkerpop/pull/243.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #243 commit b50a43ce781572a1610fa3e31b5132205796af67 Author: Marko A. Rodriguez <okrammarko@gmail.com> Date: 2016-02-19T22:48:50Z Migrated over to the proposed Memory model of using registered BinaryOperator reducers. It was really easy to change so thats good. All test cases pass for TinkerGraphComputer, one fails in SparkGraphComputer, and I have some NullPointer serialiation issue with GiraphGraphComputer that I will fix later. commit 0ae584cae51ab15eef7de776cf8c049b64ace852 Author: Marko A. Rodriguez <okrammarko@gmail.com> Date: 2016-02-19T23:45:18Z couldn't help myself. once I walked away from the computer I realized how to make GiraphMemory work. Got it working and test cases passing. Only one test fails in SparkGraphComputer. I will handle THAT next week. commit cf79c2255028d3a2a7dabb4198030b50bfb65417 Author: Marko A. Rodriguez <okrammarko@gmail.com> Date: 2016-02-20T02:35:52Z minor tweak. commit 7bf8213a5e27026f9a378a0eb166f3a67038321f Author: Marko A. Rodriguez <okrammarko@gmail.com> Date: 2016-02-20T17:03:25Z fixed the SparkMemory issue from last night. Wow. I was really expecting this ticket to take me all of next week. Knocked it out before the week even began. Bow to me all you peons. commit 706759c1dbf9df6bf9210913001658ca9f9ff513 Author: Marko A. Rodriguez <okrammarko@gmail.com> Date: 2016-02-22T16:50:06Z added GraphComputerTest.shouldSupportTransientKeys(). Ensured that only Memory.set() is allowed in setup()/terminate() and Memory.add() in execute(). Fixed up SparkGraphComputer, TinkerGraphComputer, and GiraphSparkComputer to respect the new MemoryComputeKey semantics and transient key semantics. commit b03c1adc466fa48fffcfeb7e17b71a243f66b76f Author: Marko A. Rodriguez <okrammarko@gmail.com> Date: 2016-02-22T17:38:50Z EDGE_COUNT and VOTE_STRENGTH in PageRankVertexProgram and PeerPressureVertexProgram are not transient and the respective property keys are private static. Extended GraphComputerTest.shouldSupportTransientKeys() with a MapReduce job that ensure that the transient vertex properties are not accessible during MapReduce. commit 65bbdaa336b4034421f7e2599bb3f2c307aa4773 Author: Marko A. Rodriguez <okrammarko@gmail.com> Date: 2016-02-22T21:41:02Z Have all the ReducingBarrierSteps no longer implement MapReduce and using MemoryComputeKeys for their reduction. GroupStep, FoldStep, and GroupStepV3d0 are not complete yet. Having the darndest time with GroupStep – once I get it, then the others will follow from it. Pushing to save work. commit 3389122591a788ceb566f9bdf5a1ba6b72789faa Author: Marko A. Rodriguez <okrammarko@gmail.com> Date: 2016-02-22T22:15:21Z Got GroupStep working — buts its a hack unfortunately. Pushing to save work. commit 9f8252b816432249a0667e654b682293b77ec3c1 Author: Marko A. Rodriguez <okrammarko@gmail.com> Date: 2016-02-23T00:21:48Z Fixed a bug in FoldStep, got GroupStep working perfectly (both OLTP and OLAP – but there is still one awakward hack). Need to spend some more cycles on GroupStep and then once I get that clean, clean, clean, I will map that pattern over to GroupStepV3d0 and that will be that. commit 90debb4fed3bffa5833543566620bcf0a30ae7f4 Author: Marko A. Rodriguez <okrammarko@gmail.com> Date: 2016-02-23T01:15:09Z some more work on GroupStep ---- converging on final solution. commit 9416e9498a363e8ca1ef69df6cc0d046762e43e3 Author: Marko A. Rodriguez <okrammarko@gmail.com> Date: 2016-02-23T17:25:36Z New Reducing-based Memory-model implemented. A few kooky things emerged because of this and will discuss in UPGRADE docs. However, all-in-all this is a much nicer model which will lead to significant perofmrance improvements (still need to benchmark and test). I don't think we will ever deprecate the MapReduce infrastructure. The Memory model is not as flexible and efficient (for certain jobs) as MapReduce. commit d77bf58c8e60f52f180af21ef18091c3cb143c3b Author: Marko A. Rodriguez <okrammarko@gmail.com> Date: 2016-02-23T20:40:04Z Solved the GroupStep-hack. The whole notion of a FinalGet is bad. Every ReducingBarrier step now implements generateFinalReduction() with the default being the identity function. For GroupStep, MeanStep, and GroupStepV3d0, they do the final respective reduction. Also, GroupStep is a bit more organized – only one BinaryOperator needed. commit 14d0d48435640572523d942d56b9a7a6dd128969 Author: Marko A. Rodriguez <okrammarko@gmail.com> Date: 2016-02-23T23:21:02Z The next big thing in the new MemoryComputeKey model – broadcasting. It is possible to state that a MemoryKey will NOT be read by workers and thus, no need to send the data to the workers on each iteration. Added GraphComputerTest.shouldSupportBroadcastKeys(). SparkGraphComputer supports this natively, TinkerGraphComputer simply hides the data when trying to be accessed by workers, and GiraphGraphComputer (like TinkerGraphComputer) but I will be able to at least clear the data immediately once its sent (future work). Cleaned up GroupStep a bit. Have a consistent naming convention for workers vs. master — inExecute. All XXXMemory implementations use it so its easier to see how they all relate to each other. Added a GraphComputerTest to make sure exceptions are correct around get(), add(), set() for various situations – found a couple of inconsistencies that are now fixed up. commit 90e862d76ee23d8e03f1b954be4276cd09f707ed Author: Marko A. Rodriguez <okrammarko@gmail.com> Date: 2016-02-23T23:21:35Z Merge branch 'master' into TINKERPOP-1166 commit e6adbecfc401c4f41705216fecfac74ff04b5c8e Author: Marko A. Rodriguez <okrammarko@gmail.com> Date: 2016-02-24T00:37:23Z Needed a no-args constructor for FoldBiOperator in GiraphGraphComputer. commit aa79390f7afd78091755784753ed67a311f1d7da Author: Marko A. Rodriguez <okrammarko@gmail.com> Date: 2016-02-24T20:52:29Z Okay. So MapReduce in TraversalVertexProgram is going away fast. GroupSideEffectStep and GroupStep are now one step – GroupStep. Likewise for GroupCountStep. groupCount() is simply groupCount().cap(). This makes the code alot simpler and easier to optimize everything in one spot. With this direction, TraversalVertexProgram will be able to do OLAP ... then in terminate(), do reductions. If those reductions yield traversers, termiate() == false, and we distributed messages again. Thus, we will have OLTP->OLAP->OLTP->OLAP all possible within in a single TraveraslVertexProgram. The idea is that the master worker (termiante()) will do OLTP processing until it needs to go back to OLAP (if necessary). There is still lots more work to do. This push is to save work. commit 980525cd2d7ee656858f68703635972c740486fa Author: Marko A. Rodriguez <okrammarko@gmail.com> Date: 2016-02-24T22:13:09Z StoreStep no longer uses MapReduce. All that is left is AggregateStep and ProfileStep. commit 2f3b2ac971ade400b4f2bfb794b451a612ed0f69 Author: Marko A. Rodriguez <okrammarko@gmail.com> Date: 2016-02-25T15:29:23Z out of my 24 hole. @dkuppitz – I was wrong, you can't have XXXStep and XXXSideEffectStep as one in the same. I learned exactly what I learned about a year ago by trying. However, note that all GroupCountXXXStep and GroupXXXStep are no longer MapReduce-based, but GraphComputer.Memory-based and because of their alignment they share lots of the same code. All that is left is TreeStep, ProfileStep, and AggregateStep to convert to the GraphComputer.Memory model. commit aae8910388873f7a749b3df330fca05ce7b5d2eb Author: Marko A. Rodriguez <okrammarko@gmail.com> Date: 2016-02-25T15:53:09Z Got rid of the FinalGet concept. This won't work moving forward. Each step is responsible for its final reduction via GraphComputing.generateFinalReduction(). TreeStep and TreeSideEffectStep now both use the GraphComputer.Memory model. All that is left is AggregateStep and ProfileStep. commit d1812f806cd360ce5716a620413a28938ef80ab6 Author: Marko A. Rodriguez <okrammarko@gmail.com> Date: 2016-02-25T17:40:40Z RangeGlobalStep is now computing in TraversalVertexProgram. Will slowly dissect away the end-steps of TraverserMapReduce such that we have a pure OLTP/OLAP model within TraversalVertexProgram. commit e6d9d6d2581dd3ef4c32d336df0eb7a8989bbaba Author: Marko A. Rodriguez <okrammarko@gmail.com> Date: 2016-02-25T17:53:18Z TailGlobalStep is no longer an OLAP end step. CollectingBarriers are up next – Aggregate, Order, Dedup. commit c4b9e741421d2af895bdf3c37166f952209d72d4 Author: Marko A. Rodriguez <okrammarko@gmail.com> Date: 2016-02-25T18:37:44Z All the ReducingBarrierSteps inherent their MemoryComputeKey from abstract ReducingBarrierStep. There key is now step.getId(), not longer ReducingBarrierStep.REDUCING as you can now have multiple reducers in the same OLAP job. Updated ComputerResultStep accordingly. Next up will be such that ComputerResultStep does not do ANY introspection into TraversalVertexProgramStep – it simply pulls ~traversers from GraphComputer.Memory. And thats all there is to it. commit 6950b198aff0d1fb4842d972bc145a6eeae539b2 Author: Marko A. Rodriguez <okrammarko@gmail.com> Date: 2016-02-25T21:46:23Z Gremlin OLAP can now do all the same traversals as Gremlin OLTP. In fact, a Gremlin OLAP job is a undulation between distributed traversers and localized traversers. Its really neat – it like 'breathes'. Fan out, reduce, fan out, reduce.... I still have to convert over OrderGlobalStep so its not done done, but yea. Saving work. commit 32dd068291e45179648340745c882df459a8115a Author: Marko A. Rodriguez <okrammarko@gmail.com> Date: 2016-02-26T00:04:55Z Okay. Here is the mother load. OrderGlobalStep, DedupGlobalStep, etc. can now exist anywhere in an OLAP traversal. There are some loose ends that still need to be cleanup (as well as some major code reorg and compression), but this is the stuff. This has been a long time coming. This new GraphComputer.Memory model is sooooo much more efficient and gives us much more expressivity. Stoked. commit 066cc218bb90a27f1ee016b4db3016a630f00f63 Author: Marko A. Rodriguez <okrammarko@gmail.com> Date: 2016-02-26T01:17:16Z limit(1) is now compiled into the TraversalVertexProgramStep job. Forgot to update this test. commit 42b7254505563eaa1925639446cbb55ad708c7b7 Author: Marko A. Rodriguez <okrammarko@gmail.com> Date: 2016-02-26T15:45:14Z I totally forgot about @dkuppitz Operator.java work. I gutted lots of rewritten operators and now just uses Operator. This makes less classes to register with GryoMapper – phew. It was getting insane. Also, in the future, if we want to make more optimal implementations, we can just add stuff like Operator.sumLong() and it doesn't effect serialization because Operator is an enum. commit efc1739f183ff8f30be54b53c393e3ba8d03ced9 Author: Marko A. Rodriguez <okrammarko@gmail.com> Date: 2016-02-26T16:33:33Z tada – fixed my groupCount()...groupCount()... bug. Added a crazy asynchronous traversal to groupCount() that demonstrates that the timing is correct for OLAP to OLTP to OLAP conversion. I went through all the test case that have Ignore over a test and remove lots of them. Some didnt even make sense why we had them Ignored. There are only a few Ignore(COMPUTER) and they make sense... (not related to this PR, but in general issues with serialization or whatnot). This new OLAP work is sooooo slammin. commit 958791792b3e48b7c983fb5a54ae21671e1154f6 Author: Marko A. Rodriguez <okrammarko@gmail.com> Date: 2016-02-26T17:11:38Z renamed Operator.add() to Operator.addAll() as its more clear especially since Operator.sum() exists.
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user dylanht commented on the pull request:

          https://github.com/apache/incubator-tinkerpop/pull/243#issuecomment-189493204

          When I tried to build this, maven complained the now capitalized "D" in the GroupSideEffectStepV3D0Test.java filename:

          gremlin-core/src/test/java/org/apache/tinkerpop/gremlin/process/traversal/step/sideEffect/GroupSideEffectStepV3D0Test.java

          Changing it back to lower case fixed it and the build goes through.

          Show
          githubbot ASF GitHub Bot added a comment - Github user dylanht commented on the pull request: https://github.com/apache/incubator-tinkerpop/pull/243#issuecomment-189493204 When I tried to build this, maven complained the now capitalized "D" in the GroupSideEffectStepV3D0Test.java filename: gremlin-core/src/test/java/org/apache/tinkerpop/gremlin/process/traversal/step/sideEffect/GroupSideEffectStepV3D0Test.java Changing it back to lower case fixed it and the build goes through.
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user spmallette commented on the pull request:

          https://github.com/apache/incubator-tinkerpop/pull/243#issuecomment-189525508

          If that naming is the problem with that test it should get fixed on the tp31 branch as it's failing appveyor over there:

          https://issues.apache.org/jira/browse/TINKERPOP-1179

          Show
          githubbot ASF GitHub Bot added a comment - Github user spmallette commented on the pull request: https://github.com/apache/incubator-tinkerpop/pull/243#issuecomment-189525508 If that naming is the problem with that test it should get fixed on the tp31 branch as it's failing appveyor over there: https://issues.apache.org/jira/browse/TINKERPOP-1179
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user okram commented on the pull request:

          https://github.com/apache/incubator-tinkerpop/pull/243#issuecomment-189525554

          @dylanht – dah. Sometimes git does that. I did a class rename and it accidently made it "D". Then I renamed it to "d" and it still always gets fuggled. @spmallette – you have had this problem before. What is the fix?

          Show
          githubbot ASF GitHub Bot added a comment - Github user okram commented on the pull request: https://github.com/apache/incubator-tinkerpop/pull/243#issuecomment-189525554 @dylanht – dah. Sometimes git does that. I did a class rename and it accidently made it "D". Then I renamed it to "d" and it still always gets fuggled. @spmallette – you have had this problem before. What is the fix?
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user spmallette commented on the pull request:

          https://github.com/apache/incubator-tinkerpop/pull/243#issuecomment-189528302

          i don't remember offhand - i'll have to take a look. will fix on the tp31 side though so you'll probably want to rebase your branch when that happens.

          Show
          githubbot ASF GitHub Bot added a comment - Github user spmallette commented on the pull request: https://github.com/apache/incubator-tinkerpop/pull/243#issuecomment-189528302 i don't remember offhand - i'll have to take a look. will fix on the tp31 side though so you'll probably want to rebase your branch when that happens.
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user okram commented on the pull request:

          https://github.com/apache/incubator-tinkerpop/pull/243#issuecomment-190388717

          ```
          [INFO] ------------------------------------------------------------------------
          [INFO] Reactor Summary:
          [INFO]
          [INFO] Apache TinkerPop .................................. SUCCESS [3.658s]
          [INFO] Apache TinkerPop :: Gremlin Shaded ................ SUCCESS [2.210s]
          [INFO] Apache TinkerPop :: Gremlin Core .................. SUCCESS [35.088s]
          [INFO] Apache TinkerPop :: Gremlin Test .................. SUCCESS [10.526s]
          [INFO] Apache TinkerPop :: Gremlin Groovy ................ SUCCESS [38.056s]
          [INFO] Apache TinkerPop :: Gremlin Groovy Test ........... SUCCESS [5.694s]
          [INFO] Apache TinkerPop :: TinkerGraph Gremlin ........... SUCCESS [3:08.700s]
          [INFO] Apache TinkerPop :: Hadoop Gremlin ................ SUCCESS [4:58.574s]
          [INFO] Apache TinkerPop :: Spark Gremlin ................. SUCCESS [6:52.110s]
          [INFO] Apache TinkerPop :: Giraph Gremlin ................ SUCCESS [2:18:15.161s]
          [INFO] Apache TinkerPop :: Neo4j Gremlin ................. SUCCESS [18:09.462s]
          [INFO] Apache TinkerPop :: Gremlin Driver ................ SUCCESS [8.172s]
          [INFO] Apache TinkerPop :: Gremlin Server ................ SUCCESS [11:11.713s]
          [INFO] Apache TinkerPop :: Gremlin Console ............... SUCCESS [1:08.896s]
          [INFO] Apache TinkerPop :: Gremlin Archetype ............. SUCCESS [0.083s]
          [INFO] Apache TinkerPop :: Archetype - TinkerGraph ....... SUCCESS [5.695s]
          [INFO] Apache TinkerPop :: Archetype - Server ............ SUCCESS [10.165s]
          [INFO] ------------------------------------------------------------------------
          [INFO] BUILD SUCCESS
          [INFO] ------------------------------------------------------------------------
          [INFO] Total time: 3:05:44.437s
          [INFO] Finished at: Mon Feb 29 14:01:04 MST 2016
          [INFO] Final Memory: 97M/810M
          [INFO] ------------------------------------------------------------------------
          ```

          VOTE +1.

          Show
          githubbot ASF GitHub Bot added a comment - Github user okram commented on the pull request: https://github.com/apache/incubator-tinkerpop/pull/243#issuecomment-190388717 ``` [INFO] ------------------------------------------------------------------------ [INFO] Reactor Summary: [INFO] [INFO] Apache TinkerPop .................................. SUCCESS [3.658s] [INFO] Apache TinkerPop :: Gremlin Shaded ................ SUCCESS [2.210s] [INFO] Apache TinkerPop :: Gremlin Core .................. SUCCESS [35.088s] [INFO] Apache TinkerPop :: Gremlin Test .................. SUCCESS [10.526s] [INFO] Apache TinkerPop :: Gremlin Groovy ................ SUCCESS [38.056s] [INFO] Apache TinkerPop :: Gremlin Groovy Test ........... SUCCESS [5.694s] [INFO] Apache TinkerPop :: TinkerGraph Gremlin ........... SUCCESS [3:08.700s] [INFO] Apache TinkerPop :: Hadoop Gremlin ................ SUCCESS [4:58.574s] [INFO] Apache TinkerPop :: Spark Gremlin ................. SUCCESS [6:52.110s] [INFO] Apache TinkerPop :: Giraph Gremlin ................ SUCCESS [2:18:15.161s] [INFO] Apache TinkerPop :: Neo4j Gremlin ................. SUCCESS [18:09.462s] [INFO] Apache TinkerPop :: Gremlin Driver ................ SUCCESS [8.172s] [INFO] Apache TinkerPop :: Gremlin Server ................ SUCCESS [11:11.713s] [INFO] Apache TinkerPop :: Gremlin Console ............... SUCCESS [1:08.896s] [INFO] Apache TinkerPop :: Gremlin Archetype ............. SUCCESS [0.083s] [INFO] Apache TinkerPop :: Archetype - TinkerGraph ....... SUCCESS [5.695s] [INFO] Apache TinkerPop :: Archetype - Server ............ SUCCESS [10.165s] [INFO] ------------------------------------------------------------------------ [INFO] BUILD SUCCESS [INFO] ------------------------------------------------------------------------ [INFO] Total time: 3:05:44.437s [INFO] Finished at: Mon Feb 29 14:01:04 MST 2016 [INFO] Final Memory: 97M/810M [INFO] ------------------------------------------------------------------------ ``` VOTE +1.
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user spmallette commented on the pull request:

          https://github.com/apache/incubator-tinkerpop/pull/243#issuecomment-190429840

          Note the commit above that fixes the badly named test. easy to fix with `git mv`:

          ```text
          $ git mv -f gremlin-core/src/test/java/org/apache/tinkerpop/gremlin/process/traversal/step/sideEffect/GroupSideEffectStepV3D0Test.java gremlin-core/src/test/java/org/apache/tinkerpop/gremlin/process/traversal/step/sideEffect/GroupSideEffectStepV3d0Test.java
          ```

          Show
          githubbot ASF GitHub Bot added a comment - Github user spmallette commented on the pull request: https://github.com/apache/incubator-tinkerpop/pull/243#issuecomment-190429840 Note the commit above that fixes the badly named test. easy to fix with `git mv`: ```text $ git mv -f gremlin-core/src/test/java/org/apache/tinkerpop/gremlin/process/traversal/step/sideEffect/GroupSideEffectStepV3D0Test.java gremlin-core/src/test/java/org/apache/tinkerpop/gremlin/process/traversal/step/sideEffect/GroupSideEffectStepV3d0Test.java ```
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user dkuppitz commented on the pull request:

          https://github.com/apache/incubator-tinkerpop/pull/243#issuecomment-190480886

          Travis is happy, docs built properly on my local machine,

          VOTE: +1

          Show
          githubbot ASF GitHub Bot added a comment - Github user dkuppitz commented on the pull request: https://github.com/apache/incubator-tinkerpop/pull/243#issuecomment-190480886 Travis is happy, docs built properly on my local machine, VOTE: +1
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user spmallette commented on the pull request:

          https://github.com/apache/incubator-tinkerpop/pull/243#issuecomment-190674706

          VOTE +1 - full build with integration tests

          Show
          githubbot ASF GitHub Bot added a comment - Github user spmallette commented on the pull request: https://github.com/apache/incubator-tinkerpop/pull/243#issuecomment-190674706 VOTE +1 - full build with integration tests
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user asfgit closed the pull request at:

          https://github.com/apache/incubator-tinkerpop/pull/243

          Show
          githubbot ASF GitHub Bot added a comment - Github user asfgit closed the pull request at: https://github.com/apache/incubator-tinkerpop/pull/243

            People

            • Assignee:
              okram Marko A. Rodriguez
              Reporter:
              okram Marko A. Rodriguez
            • Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development