Uploaded image for project: 'Nutch'
  1. Nutch
  2. NUTCH-641

IndexSorter incorrectly copies stored fields

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Closed
    • Critical
    • Resolution: Fixed
    • 1.0.0
    • 1.0.0
    • indexer
    • None
    • Patch Available

    Description

      Recent versions of Lucene introduced IndexReader.document(int, FieldSelector) method. When using IndexWriter.addIndexes(IndexReader[]) Lucene now uses that method from IndexReader instead of the old one IndexReader.document(int).

      Unfortunately, this new method is not overriden in IndexSorter, which leads to a subtle corruption of sorted indexes - while the indexed fields are sorted properly, the values from stored fields are not sorted and remain in the sorted index in the original order. This means that in a sorted index the values of indexed fields and stored fields are completely out of sync, which later results in incorrect documents being retrieved from segments.

      Attachments

        1. indexSorter.patch
          6 kB
          Andrzej Bialecki

        Activity

          People

            ab Andrzej Bialecki
            ab Andrzej Bialecki
            Votes:
            0 Vote for this issue
            Watchers:
            0 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: