Uploaded image for project: 'TinkerPop'
  1. TinkerPop
  2. TINKERPOP-2980

LocalStep's "object-local" behavior is not clearly described in the doc

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Open
    • Major
    • Resolution: Unresolved
    • 3.6.5
    • None
    • documentation
    • None

    Description

      LocalStep is supposed to handle solutions locally, but what it actually does is unclear from the documentation.

      What LocalStep actually does is,

      • it just processes TraverserSet as it is (they are kept bulked, without being split).
      • So when there are same elements in the previous Step's output and as long as they are bulked into a TraverserSet, it is processed in "object-local" manner.
      • How TraverserSet is bulked ? It relies upon LazyBarrierStrategy which inserts noOpBarrierStep, that handles the bulking.
      • Or we can explicitly add barrier() step to make the bulking happen

      This creates some discrepancies that users may not easily see. As an illustration, this is the regular "object-local" behavior.

      gremlin> g.V().in().out()
      ==>v[3]
      ==>v[3]
      ==>v[3]
      ==>v[2]
      ==>v[2]
      ==>v[2]
      ==>v[4]
      ==>v[4]
      ==>v[4]
      ==>v[5]
      ==>v[5]
      ==>v[3]
      ==>v[3]
      ==>v[3]
      
      gremlin> g.V().in().out().local(count())
      ==>6
      ==>3
      ==>3
      ==>2

      You can see that the same objects (vertices) are processed locally. However, there is a case that it does not work in the way.

      For example, you can disable the Strategy

      gremlin> g.withoutStrategies(LazyBarrierStrategy.class).V().in().out().local(count())
      ==>1
      ==>1
      ==>1
      ==>1
      ==>1
      ==>1
      ==>1
      ==>1
      ==>1
      ==>1
      ==>1
      ==>1
      ==>1
      ==>1

      then we are seeing "solution-local" behavior, each single solution processed locally. Likewise, there is a case that LazyBarrierStrategy does not kick in.

      gremlin> g.V(1,1,1).local(count())
      ==>1
      ==>1
      ==>1

      It relies upon LazyBarrierStrategy but it would not be apparent to users. Furthermore, GraphProviders have freedom to drop any TinkerPop's strategies, so if LazyBarrierStrategy is dropped, local always works in solution-local manner.

      There is a description in the doc that users may use map or flatMap. This can work, but many users may already be using local for "solution-local" without noticing. Also there are subtle differences among them.

      (1) map only emits one solution per each incoming input, while working in solution-local

      gremlin> g.V().map(out()).path()
      ==>[v[1],v[3]]
      ==>[v[4],v[5]]
      ==>[v[6],v[3]]

      (2) flatMap }}can stream all solutions and solution-local, but only leaves the last element in Path unlike {{local

      gremlin> g.V().flatMap(out().out()).path()
      ==>[v[1],v[5]]
      ==>[v[1],v[3]]

      This flatMap's behavior is not documented but there are use-cases that users intentionally use flatMap for this feature.

      So while in the documentation we recommend use these 2 instead of local, in some case it's not easy to migrate. At this point, I think

      • We should clarify in the doc that
        • What barrier() / noOpBarrierStep does and how it makes impact on local()
        • How{{ LazyBarrierStrategy }}is related to barrier() / noOpBarrierStep 
        • what is different between map, flatMap and local, including Path handling

      and instead of describing local() as internal use when implementing Strategy, we should tell users to use it whenever they understand how it works and what they are doing with local().

      Attachments

        Activity

          People

            Unassigned Unassigned
            redtree1112 Norio Akagi
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

              Created:
              Updated: