Uploaded image for project: 'Lucene - Core'
  1. Lucene - Core
  2. LUCENE-9236

Having a modular Doc Values format

Details

    • Improvement
    • Status: Open
    • Minor
    • Resolution: Unresolved
    • None
    • None
    • core/index
    • New

    Description

       Today DocValues Consumer/Producer require override 5 different methods, even if you only want to use one and given that one given field can only support one doc values type at same time.

       

      In the attached PR I’ve implemented a new modular version of those classes (consumer/producer) each one having a single responsibility and writing in the same unique file.

      This is mainly a refactor of the existing format opening the possibility to override or implement the sub-format you need.

       

      I’ll do in 3 steps:

      1. Create a CompositeDocValuesFormat and moving the code of Lucene80DocValuesFormat in separate classes, without modifying the inner code. At same time I created a Lucene85CompositeDocValuesFormat based on these changes.
      2. I’ll introduce some basic components for writing doc values in general such as:
        1. DocumentIdSetIterator Serializer: used in each type of field based on an IndexedDISI.
        2. Document Ordinals Serializer: Used in Sorted and SortedSet for deduplicate values using a dictionary.
        3. Document Boundaries Serializer (optional used only for multivalued fields: SortedNumeric and SortedSet)
        4. TermsEnum Serializer: useful to write and read the terms dictionary for sorted and sorted set doc values.
      3. I’ll create the new Sub-DocValues format using the previous components.

       

      PR: https://github.com/apache/lucene-solr/pull/1282

      Attachments

        Activity

          People

            Unassigned Unassigned
            juan.duran juan camilo rodriguez duran

            Dates

              Created:
              Updated:

              Slack

                Issue deployment