[JENA-89] Avoid a total sort for ORDER BY + LIMIT queries - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Improvement
Status: Closed
Priority: Minor
Resolution: Done
Affects Version/s: None
Fix Version/s: Jena 2.11.0
Component/s: ARQ
Labels:
- arq
- optimization
- scalability
- sparql

Description

In case of SPARQL queries with ORDER BY + LIMIT, ARQ sorts the entire result set and then produces the first N, according to the specified LIMIT.
As an alternative, discussed on jena-dev [1], we can use a PriorityQueue [2] (which is based on a priority heap) to avoid a sort operation.

ARQ's algebra package contains already a OpTopN [3] operator. The OpExecutor [4] will need to use a new QueryIterTopN instead of QueryIterSort + QueryIterSlice. A new TransformOrderByLimit to be used by Optimize is also necessary.

ORDER BY + LIMIT queries are typically used to construct the first page when results are paginated. Then the following query is ORDER BY + OFFSET + LIMIT. (Often users stop at the first page). Ideally, we could cache the ORDER BY and implement the OFFSET|LIMIT using results from the cache. However, the improvement described by this issue is limited to the ORDER BY + LIMIT case for which a priority heap is a good enough solution.

Hopefully, this would improve the scalability of ORDER BY + LIMIT queries in case of small values specified on the LIMIT.

[1] http://markmail.org/thread/5d2gtazkoxsa2ayv
[2] http://download.oracle.com/javase/6/docs/api/java/util/PriorityQueue.html
[3] https://svn.apache.org/repos/asf/incubator/jena/Jena2/ARQ/trunk/src/com/hp/hpl/jena/sparql/algebra/op/OpTopN.java
[4] https://svn.apache.org/repos/asf/incubator/jena/Jena2/ARQ/trunk/src/com/hp/hpl/jena/sparql/engine/main/OpExecutor.java

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending

ARQ_JENA-89_r1156212.patch
11/Aug/11 08:37
12 kB
Paolo Castagna
JENA-89-TopTests.patch
13/Aug/11 16:52
33 kB
Andy Seaborne
JENA-89-Top-FixHeapCollation.patch
13/Aug/11 16:56
6 kB
Andy Seaborne

Issue Links

is related to

JENA-90 Use OpReduce instead of OpDistinct for DISTINCT + ORDER BY queries

Closed

relates to

JENA-109 Optimise ORDER BY + OFFSET + LIMIT queries

Closed

JENA-111 Improving TopN optimization in case of an intermediate OpModifier

Closed

Activity

People

Assignee:: Paolo Castagna

Reporter:: Paolo Castagna

Votes:: 0 Vote for this issue

Watchers:: 1 Start watching this issue

Dates

Created:: 08/Aug/11 08:37

Updated:: 01/Feb/15 19:11

Resolved:: 01/Feb/15 19:11