Uploaded image for project: 'Nutch'
  1. Nutch
  2. NUTCH-641

IndexSorter incorrectly copies stored fields

    XMLWordPrintableJSON

    Details

    • Type: Bug
    • Status: Closed
    • Priority: Critical
    • Resolution: Fixed
    • Affects Version/s: 1.0.0
    • Fix Version/s: 1.0.0
    • Component/s: indexer
    • Labels:
      None
    • Patch Info:
      Patch Available

      Description

      Recent versions of Lucene introduced IndexReader.document(int, FieldSelector) method. When using IndexWriter.addIndexes(IndexReader[]) Lucene now uses that method from IndexReader instead of the old one IndexReader.document(int).

      Unfortunately, this new method is not overriden in IndexSorter, which leads to a subtle corruption of sorted indexes - while the indexed fields are sorted properly, the values from stored fields are not sorted and remain in the sorted index in the original order. This means that in a sorted index the values of indexed fields and stored fields are completely out of sync, which later results in incorrect documents being retrieved from segments.

        Attachments

        1. indexSorter.patch
          6 kB
          Andrzej Bialecki

          Activity

            People

            • Assignee:
              ab Andrzej Bialecki
              Reporter:
              ab Andrzej Bialecki
            • Votes:
              0 Vote for this issue
              Watchers:
              0 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: