Description
A by() is deemed "unproductive" when it does not produce a value (i.e. where the hasNext() is false). As of 3.5.x you can get an exception or a null in those cases:
gremlin> g.V().group().by('age') // null behavior ==>[null:[v[3],v[5]],32:[v[4]],35:[v[6]],27:[v[2]],29:[v[1]]] gremlin> g.V().aggregate('a').by(out()).cap('a') // exception behavior The provided traverser does not map to a value: v[2]->[VertexStep(OUT,vertex)] Type ':help' or ':h' for help. Display stack trace? [yN]n
The by(String) behavior for this introduce in 3.5.x has become a bit of a special case around by() and as we are slowly removing exceptions for cleaner behavior in Gremlin, there clearly needs to be a more consistent approach taken here. Here are some problems with how things are now:
1. by(String) is a bit too much of a special case and is now inconsistent with the other forms of by() which continue to throw runtime exceptions (which isn’t desirable either).
2. By propagating a null from an unproductive by(), it becomes impossible to distinguish between a null property value (from an existing property) and a property that is simply missing.
3. Traversals that use simplePath() or cyclicPath() with unproductive by() modulators might lead to confusing results where generated null values create an unexpected equality. Of course, this may be considered an orthogonal issue, as it might also be possible to change or parameterize these steps to handle null differently.
4. The dedup() step will return the first by() modulator that returns null which might be unexpected.
To bring some consistent behavior to this situation an unproductive by() will simply filter results for all steps. How that filtering is applied is specific to the step itself but generally speaking the filtering will either:
1. Filter the traverser or otherwise,
2. Ignore the traverser for purpose of constructing a side-effect.
As this sort of filtering might make Gremlin harder to debug, TINKERPOP-2634 will improve the ability to debug traversals, via DebugStrategy, which in and of itself is a well requested feature. The DebugStrategy would let users know about unproductive by() modulators by simply throwing an exception as it does in 3.4.x. It will accomplish this by converting every by() modulator to this pattern: coalesce(<modulator>, fail()) where fail() would be a new step that kills the traversal by raising an exception (a feature that has long been asked for). In addition to fail(), there could also be more of a soft warning in the form of a log or trace if that is desired. This pattern could be implemented as a GraphTraversal or a special case implementation of Traversal (like, ValueTraversal), but the key point is that there would now be a way to be aware of unproductive by() filters while developing your application.
This change in behavior does not have to be a breaking change. The introduction of a new ProductiveByStrategy could unify all {by()}} behavior to produce a null. This strategy would wrap by() modulators in coalesce(<by>, constant(null)) and be installed by default. It could even be made configurable to take the keys to which it would apply leaving Gremlin in a more optimized state in such cases. The default behavior without this strategy would be changed to filter. For 3.6.0, the ProductiveByStrategy could be removed as a default strategy with the option for users to add it back in to maintain the 3.5.x functionality.