Type: New Feature
Affects Version/s: None
Fix Version/s: None
It would be nice to eliminate all memory bounds on queries. Similar to
JENA-44, it would involve modifying all of the QueryIterator objects that maintain unbounded collections of Bindings.
The ones I've identified (let me know if I've missed any):
Probably one of the more complicated implementations. I think it can be done with a DistinctDataBag.
Can be implemented trivially using DistinctDataBag, but would lose streaming capability. We could do streaming just until the first spill, which would be a little more difficult but not bad. If we wanted streaming even after spilling, then we would need an on-disk hashtable or b-tree (which could get expensive for maybe limited benefit, do you really need streaming after 10,000 results?).
Only appears to be used QueryIterService. Simple implementation using DefaultDataBag.
Does not match DataBag's assumption of completing all writes before iterating. But it isn't used anywhere, so maybe we remove it?
Both of these materialize the RHS into a collection. Can be implemented with DefaultDataBag. As an aside, is this necessary to do for all queries? What if the RHS is cheap (i.e. a single TriplePattern)?
Both materialize RHS. Are they used anywhere? I was under the impression that ARQ only considered left-deep plans with indexed joins on the RHS TriplePatterns.
I'm not sure how this is handled. Are these materialized somewhere?