[LUCENE-6322] IndexSearcher.doc(int docID, SetfieldsToLoad) is slower in Lucene 4.9 when compared to Lucene 2.9 - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Open
Priority: Major
Resolution: Unresolved
Affects Version/s: 4.9
Fix Version/s: 4.10.5
Component/s: core/codecs
Labels:
None
Environment:

Windows, JDK 7/8

Lucene Fields:

New

Description

We use IndexSearcher.doc(int docID, SetfieldsToLoad) method to get the document with selected stored fields. If we did not mention few stored fields which have data more than 500KB, this call is slower in Lucene 4.9 when compared to Lucene 2.9.

I debugged the above method with Lucene 4.9 and found that CompressingStoredFieldsReader#visitDocument(int docID, StoredFieldVisitor visitor) is spending more time while loading file content and decompressing in chunks of 16kb, even to skip the fields. It is noticeable degrade if the document's field size is more than 1MB, and we call this method in loop for more than 1000 such documents.

In case of Lucene 2.9, there was no compression, and if we want to skip the field, it just does file seek to set the next pointer to read the stored field. For example see Lucene3xStoredFieldsReader#skipField() method how it works for skipping a field in Lucene 2.9 which is VERY faster compared to Lucene 4.9.

We should have something in CompressingStoredFieldsReader to know the field’s compressed length in file and just do the file seek to set the next pointer instead of loading content from file and decompress that in 16KB chunks to just skip the field from the file.

Attachments

Activity

People

Assignee:: Unassigned

Reporter:: Sekhar

Votes:: 0 Vote for this issue

Watchers:: 5 Start watching this issue

Dates

Created:: 02/Mar/15 11:26

Updated:: 28/Aug/22 14:27