Uploaded image for project: 'Lucene - Core'
  1. Lucene - Core
  2. LUCENE-9300

Index corruption with doc values updates and addIndexes

    XMLWordPrintableJSON

    Details

    • Type: Bug
    • Status: Closed
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: master (9.0), 7.7.3, 8.6, 8.5.1
    • Component/s: None
    • Labels:
      None
    • Lucene Fields:
      New

      Description

      Today a doc values update creates a new field infos file that contains the original field infos updated for the new generation as well as the new fields created by the doc values update.

      However existing fields are cloned through the global fields (shared in the index writer) instead of the local ones (present in the segment). In practice this is not an issue since field numbers are shared between segments created by the same index writer. But this assumption doesn't hold for segments created by different writers and added through IndexWriter#addIndexes(Directory). In this case, the field number of the same field can differ between segments so any doc values update can corrupt the index by assigning the wrong field number to an existing field in the next generation. 

      When this happens, queries and merges can access wrong fields without throwing any error, leading to a silent corruption in the index.

       

      Since segments are not guaranteed to have the same field number consistently we should ensure that doc values update preserves the segment's field number when rewriting field infos.

        Attachments

          Issue Links

            Activity

              People

              • Assignee:
                Unassigned
                Reporter:
                jim.ferenczi Jim Ferenczi
              • Votes:
                0 Vote for this issue
                Watchers:
                3 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved:

                  Time Tracking

                  Estimated:
                  Original Estimate - Not Specified
                  Not Specified
                  Remaining:
                  Remaining Estimate - 0h
                  0h
                  Logged:
                  Time Spent - 4h 10m
                  4h 10m