Details
-
Improvement
-
Status: Open
-
Major
-
Resolution: Unresolved
-
8.1.1
-
None
-
None
-
New
Description
The `FieldComparator` has many responsibilities and users get all of them at once. At high level the main functionalities of `FieldComparator` are
- Provide LeafFieldComparator
- Allocate storage for requested number of hits
- Read the values from DocValues/Custom source etc.
- Compare two values
There are two major areas for improvement
- The logic of reading values and storing them are coupled.
- User need to specify the size in order to create a `FieldComparator` but sometimes the size is unknown upfront.
- From `FieldComparator`'s API, one can't reason about thread-safety so it is not suitable for concurrent search.
E.g. Can two concurrent thread use the same `FieldComparator` to call `getLeafComparator` for two different segments they are working on? In fact, almost all existing implementations of `FieldComparator` are not thread-safe.
The proposal is to enhance `SortField` with two APIs
- int compare(Object v1, Object v2) – this is to compare two values from different docs for this field
- ValueAccessor newValueAccessor(LeafReaderContext leaf) – This encapsulate the logic for obtaining the right implementation in order to read the field values.
`ValueAccessor` should be accessed in a similar way as `DocValues` to provide the sort value for a document in an advance & read fashion.
With this API, hopefully we can reduce the memory usage when using `FieldComparator` because the users either store the sort values or at least the slot number besides the storage allocated by `FieldComparator` itself. Ideally, only once copy of the values should be stored.
The proposed API is also more friendly to concurrent search since it provides the `ValueAccessor` per leaf. Although same `ValueAccessor` can't be shared if there are more than one thread working on the same leaf, at least they can initialize their own `ValueAccessor`.