Uploaded image for project: 'Apache Jena'
  1. Apache Jena
  2. JENA-119

Eliminate memory bounds during query execution

VotersWatch issueWatchersLinkCloneUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

Details

    • New Feature
    • Status: Closed
    • Major
    • Resolution: Done
    • None
    • None
    • ARQ
    • None

    Description

      It would be nice to eliminate all memory bounds on queries. Similar to JENA-44, it would involve modifying all of the QueryIterator objects that maintain unbounded collections of Bindings.

      The ones I've identified (let me know if I've missed any):

      + QueryIterSort
      Complete!

      + QueryIterGroup
      Probably one of the more complicated implementations. I think it can be done with a DistinctDataBag.

      + QueryIterDistinct
      Can be implemented trivially using DistinctDataBag, but would lose streaming capability. We could do streaming just until the first spill, which would be a little more difficult but not bad. If we wanted streaming even after spilling, then we would need an on-disk hashtable or b-tree (which could get expensive for maybe limited benefit, do you really need streaming after 10,000 results?).

      + QueryIteratorCopy
      Only appears to be used QueryIterService. Simple implementation using DefaultDataBag.

      + QueryIteratorCaching
      Does not match DataBag's assumption of completing all writes before iterating. But it isn't used anywhere, so maybe we remove it?

      + QueryIterDiff
      + QueryIterMinus
      Both of these materialize the RHS into a collection. Can be implemented with DefaultDataBag. As an aside, is this necessary to do for all queries? What if the RHS is cheap (i.e. a single TriplePattern)?

      + QueryIterJoin
      + QueryIterLeftJoin
      Both materialize RHS. Are they used anywhere? I was under the impression that ARQ only considered left-deep plans with indexed joins on the RHS TriplePatterns.

      + SubQueries
      I'm not sure how this is handled. Are these materialized somewhere?

      Attachments

        Issue Links

        Activity

          This comment will be Viewable by All Users Viewable by All Users
          Cancel

          People

            sallen Stephen Allen
            sallen Stephen Allen
            Votes:
            1 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Slack

                Issue deployment