Apache Jena
  1. Apache Jena
  2. JENA-109

Optimise ORDER BY + OFFSET + LIMIT queries

    Details

    • Type: Improvement Improvement
    • Status: Closed
    • Priority: Minor Minor
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: None
    • Component/s: ARQ

      Description

      The benefits of JENA-89 gets lost as soon as someone uses OFFSET, even for low values.

      Maybe we can assume users will not hit 'next page' too many times.
      We can increase the TOPN_LIMIT_THRESHOLD in TransformTopN to 100000 and apply the same TopN optimization we discussed in JENA-89 even when we have OFFSET (when OFFSET + LIMIT < TOPN_LIMIT_THRESHOLD).
      In QueryIterTopN we simply return IteratorArray.create(y, offset, offset+limit) instead of IteratorArray.create.

      This way we can avoid total sort for a few number of small pages (all within the 100000 threshold).

        Issue Links

          Activity

          Hide
          Paolo Castagna added a comment -

          > Setting the window to 100000 interacts with JENA-44, which gives parallelism in the sort. I suggest a more cautious limit of a few hundred items.

          Ok.

          Even if the interaction is one exclude the other rather than causing problems or wrong results.
          Maybe, the window for JENA-89 and JENA-109 should be configurable via ARQ symbols as it will be the threshold for JENA-44. This way, users can tune their systems as they wish. Not an excuse to come up with sensible and reasonable defaults though.

          > There's no need to change (TopN). [...] put a slice over the "top"

          Ok, it's much simpler this way. I'll do that.

          Show
          Paolo Castagna added a comment - > Setting the window to 100000 interacts with JENA-44 , which gives parallelism in the sort. I suggest a more cautious limit of a few hundred items. Ok. Even if the interaction is one exclude the other rather than causing problems or wrong results. Maybe, the window for JENA-89 and JENA-109 should be configurable via ARQ symbols as it will be the threshold for JENA-44 . This way, users can tune their systems as they wish. Not an excuse to come up with sensible and reasonable defaults though. > There's no need to change (TopN). [...] put a slice over the "top" Ok, it's much simpler this way. I'll do that.
          Hide
          Andy Seaborne added a comment -

          It's a reasonable thing to add.

          Setting the window to 100000 interacts with JENA-44, which gives parallelism in the sort. I suggest a more cautious limit of a few hundred items. Some performance comparision can then be done.

          About the patch:

          There's no need to change (TopN). Instead, rewrite the query to have a slice and

          SELECT ..{ } ORDER BY ... OFFSET X LIMIT N

          (slice X _
          (top (N ...)
          (...)
          ))

          that is, put a slice over the "top" and apply the top to N+X. This keeps (top) focused on what it does - a fixed length sort/limit.

          i.e.

          SELECT *
          { SELECT ... { } ORDER BY ... LIMIT N+X }
          OFFSET X

          There's no need for an outer LIMIT (which would be N)

          Show
          Andy Seaborne added a comment - It's a reasonable thing to add. Setting the window to 100000 interacts with JENA-44 , which gives parallelism in the sort. I suggest a more cautious limit of a few hundred items. Some performance comparision can then be done. About the patch: There's no need to change (TopN). Instead, rewrite the query to have a slice and SELECT ..{ } ORDER BY ... OFFSET X LIMIT N (slice X _ (top (N ...) (...) )) that is, put a slice over the "top" and apply the top to N+X. This keeps (top) focused on what it does - a fixed length sort/limit. i.e. SELECT * { SELECT ... { } ORDER BY ... LIMIT N+X } OFFSET X There's no need for an outer LIMIT (which would be N)
          Hide
          Paolo Castagna added a comment -

          A little bit more testing is necessary, however this shows my proposal.

          What do you think?

          Show
          Paolo Castagna added a comment - A little bit more testing is necessary, however this shows my proposal. What do you think?

            People

            • Assignee:
              Paolo Castagna
              Reporter:
              Paolo Castagna
            • Votes:
              0 Vote for this issue
              Watchers:
              0 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development