[LUCENE-7482] Faster sorted index search for reverse order search - ASF JIRA

XML

Word

Printable

JSON

Details

Type: New Feature
Status: Reopened
Priority: Minor
Resolution: Unresolved
Affects Version/s: None
Fix Version/s: None
Component/s: None
Labels:
None

Lucene Fields:

New

Description

We are currently using Lucene here in my company for our main product.
Our search functionnality is quite basic and the results are always sorted given a predefined field. The user is only able to choose the sort order (Asc/Desc).

I am currently investigating using the index sort feature with EarlyTerminationSortingCollector.
This is quite a shame searching on a sorted index in reverse order do not have any optimization and was wondering if it would be possible to make it faster by creating a special "ReverseSortingCollector" for this purpose.

I am aware the posting list is designed to be always iterated in the same order, so it is not about early-terminating the search but more about filtering-out unneeded documents more efficiently.

If a segment is sorted in reverse order, we just have to delegate collection of the last matched documents.

Here is a sample quick code:

ReverseSortingCollector.java

public class ReverseSortingCollector extends FilterCollector {

	/** Sort used to sort the search results */
	protected final Sort sort;
	/** Number of documents to collect in each segment */
	protected final int numDocsToCollect;
  
[...]

    private List<FlushData> flushList = new ArrayList<>();


    private static final class FlushData {
        // ring buffer
	int[] buffer;
        
	// index of the first element in the buffer
	int index;
		
        LeafCollector leafCollector;

        FlushData(int[] buffer, LeafCollector leafCollector) {
            super();
            this.buffer = buffer;
            this.leafCollector = leafCollector;
        }
    }

    @Override
    public LeafCollector getLeafCollector(LeafReaderContext context) throws IOException {
        
	//flush previous data if any
	flush();
		
	LeafReader reader = context.reader();
        Sort segmentSort = reader.getIndexSort();
        if (isReverseOrder(sort, segmentSort)) {//segment is sorted in reverse order than the search sort
            int[] buffer = new int[numDocsToCollect];
            Arrays.fill(buffer, -1);
            FlushData flushData = new FlushData(buffer, in.getLeafCollector(context));
            flushList.add(flushData);
            return new LeafCollector() {
                @Override
                public void setScorer(Scorer scorer) throws IOException {
                }
                
                @Override
                public void collect(int doc) throws IOException {
				    //we remember the last `numDocsToCollect` documents that matched
                    buffer[flushData.index % buffer.length] = doc;
                    flushData.index++;
                }
            };
        } else {
        	    return in.getLeafCollector(context);
 	}
    }

	//flush the last `numDocsToCollect` collected documents do the delegated Collector
    public void flush() throws IOException {
        for (FlushData flushData : flushList) {
            for (int i = 0; i < flushData.buffer.length; i++) {
                int doc = flushData.buffer[(flushData.index + i) % flushData.buffer.length];
                if (doc != -1) {
                    flushData.leafCollector.collect(doc);
                }
            }
        }
        flushList.clear();
    }
	
}

This is specially efficient when used along with TopFieldCollector as a lot of docValue lookup would not take place.
In my experiment it reduced search time up to 90%.

Note 1: Does not support paging.
Note 2: Current implementation probably not thread safe

Attachments

Activity

People

Assignee:: Unassigned

Reporter:: Martin Amirault

Votes:: 0 Vote for this issue

Watchers:: 5 Start watching this issue

Dates

Created:: 07/Oct/16 02:38

Updated:: 28/Aug/22 15:04