Uploaded image for project: 'Lucene - Core'
  1. Lucene - Core
  2. LUCENE-4880

Difference in offset handling between IndexReader created by MemoryIndex and one created by RAMDirectory

    Details

    • Type: Bug
    • Status: Closed
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: 4.2
    • Fix Version/s: 4.3, 6.0
    • Component/s: core/index
    • Labels:
      None
    • Environment:

      Windows 7 (probably irrelevant)

    • Lucene Fields:
      New

      Description

      MemoryIndex skips tokens that have length == 0 when building the index; the result is that it does not increment the token offset (nor does it store the position offsets if that option is set) for tokens of length == 0. A regular index (via, say, RAMDirectory) does not appear to do this.

      When using the ICUFoldingFilter, it is possible to have a term of zero length (the \u0640 character separated by spaces). If that occurs in a document, the offsets returned at search time differ between the MemoryIndex and a regular index.

        Attachments

          Activity

            People

            • Assignee:
              Unassigned
              Reporter:
              tallison@apache.org Tim Allison
            • Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: