Uploaded image for project: 'Lucene - Core'
  1. Lucene - Core
  2. LUCENE-4880

Difference in offset handling between IndexReader created by MemoryIndex and one created by RAMDirectory

Details

    • Bug
    • Status: Closed
    • Major
    • Resolution: Fixed
    • 4.2
    • 4.3, 6.0
    • core/index
    • None
    • Windows 7 (probably irrelevant)

    • New

    Description

      MemoryIndex skips tokens that have length == 0 when building the index; the result is that it does not increment the token offset (nor does it store the position offsets if that option is set) for tokens of length == 0. A regular index (via, say, RAMDirectory) does not appear to do this.

      When using the ICUFoldingFilter, it is possible to have a term of zero length (the \u0640 character separated by spaces). If that occurs in a document, the offsets returned at search time differ between the MemoryIndex and a regular index.

      Attachments

        1. MemoryIndexVsRamDirZeroLengthTermTest.java
          7 kB
          Tim Allison
        2. LUCENE-4880.patch
          4 kB
          Robert Muir

        Activity

          People

            Unassigned Unassigned
            tallison Tim Allison
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: