Uploaded image for project: 'Lucene - Core'
  1. Lucene - Core
  2. LUCENE-8878

Provide alternative sorting utility from SortField other than FieldComparator

Details

    • Improvement
    • Status: Open
    • Major
    • Resolution: Unresolved
    • 8.1.1
    • None
    • core/search
    • None
    • New

    Description

      The `FieldComparator` has many responsibilities and users get all of them at once. At high level the main functionalities of `FieldComparator` are

      • Provide LeafFieldComparator
      • Allocate storage for requested number of hits
      • Read the values from DocValues/Custom source etc.
      • Compare two values

      There are two major areas for improvement

      1. The logic of reading values and storing them are coupled.
      2. User need to specify the size in order to create a `FieldComparator` but sometimes the size is unknown upfront.
      3. From `FieldComparator`'s API, one can't reason about thread-safety so it is not suitable for concurrent search.
        E.g. Can two concurrent thread use the same `FieldComparator` to call `getLeafComparator` for two different segments they are working on? In fact, almost all existing implementations of `FieldComparator` are not thread-safe.

      The proposal is to enhance `SortField` with two APIs

      1. int compare(Object v1, Object v2) – this is to compare two values from different docs for this field
      2. ValueAccessor newValueAccessor(LeafReaderContext leaf) – This encapsulate the logic for obtaining the right implementation in order to read the field values.
        `ValueAccessor` should be accessed in a similar way as `DocValues` to provide the sort value for a document in an advance & read fashion.

      With this API, hopefully we can reduce the memory usage when using `FieldComparator` because the users either store the sort values or at least the slot number besides the storage allocated by `FieldComparator` itself. Ideally, only once copy of the values should be stored.

      The proposed API is also more friendly to concurrent search since it provides the `ValueAccessor` per leaf. Although same `ValueAccessor` can't be shared if there are more than one thread working on the same leaf, at least they can initialize their own `ValueAccessor`.

      Attachments

        Activity

          People

            Unassigned Unassigned
            hypothesisx86 Tony Xu
            Votes:
            0 Vote for this issue
            Watchers:
            5 Start watching this issue

            Dates

              Created:
              Updated: