Uploaded image for project: 'Lucene - Core'
  1. Lucene - Core
  2. LUCENE-8878

Provide alternative sorting utility from SortField other than FieldComparator

    XMLWordPrintableJSON

    Details

    • Type: Improvement
    • Status: Open
    • Priority: Major
    • Resolution: Unresolved
    • Affects Version/s: 8.1.1
    • Fix Version/s: None
    • Component/s: core/search
    • Labels:
      None
    • Lucene Fields:
      New

      Description

      The `FieldComparator` has many responsibilities and users get all of them at once. At high level the main functionalities of `FieldComparator` are

      • Provide LeafFieldComparator
      • Allocate storage for requested number of hits
      • Read the values from DocValues/Custom source etc.
      • Compare two values

      There are two major areas for improvement

      1. The logic of reading values and storing them are coupled.
      2. User need to specify the size in order to create a `FieldComparator` but sometimes the size is unknown upfront.
      3. From `FieldComparator`'s API, one can't reason about thread-safety so it is not suitable for concurrent search.
        E.g. Can two concurrent thread use the same `FieldComparator` to call `getLeafComparator` for two different segments they are working on? In fact, almost all existing implementations of `FieldComparator` are not thread-safe.

      The proposal is to enhance `SortField` with two APIs

      1. int compare(Object v1, Object v2) – this is to compare two values from different docs for this field
      2. ValueAccessor newValueAccessor(LeafReaderContext leaf) – This encapsulate the logic for obtaining the right implementation in order to read the field values.
        `ValueAccessor` should be accessed in a similar way as `DocValues` to provide the sort value for a document in an advance & read fashion.

      With this API, hopefully we can reduce the memory usage when using `FieldComparator` because the users either store the sort values or at least the slot number besides the storage allocated by `FieldComparator` itself. Ideally, only once copy of the values should be stored.

      The proposed API is also more friendly to concurrent search since it provides the `ValueAccessor` per leaf. Although same `ValueAccessor` can't be shared if there are more than one thread working on the same leaf, at least they can initialize their own `ValueAccessor`.

        Attachments

          Activity

            People

            • Assignee:
              Unassigned
              Reporter:
              hypothesisx86 Tony Xu
            • Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

              • Created:
                Updated: