[LUCENE-4880] Difference in offset handling between IndexReader created by MemoryIndex and one created by RAMDirectory - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Closed
Priority: Major
Resolution: Fixed
Affects Version/s: 4.2
Fix Version/s: 4.3, 6.0
Component/s: core/index
Labels:
None
Environment:

Windows 7 (probably irrelevant)

Lucene Fields:

New

Description

MemoryIndex skips tokens that have length == 0 when building the index; the result is that it does not increment the token offset (nor does it store the position offsets if that option is set) for tokens of length == 0. A regular index (via, say, RAMDirectory) does not appear to do this.

When using the ICUFoldingFilter, it is possible to have a term of zero length (the \u0640 character separated by spaces). If that occurs in a document, the offsets returned at search time differ between the MemoryIndex and a regular index.

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending

LUCENE-4880.patch
09/Apr/13 13:05
4 kB
Robert Muir
MemoryIndexVsRamDirZeroLengthTermTest.java
25/Mar/13 12:17
7 kB
Tim Allison

Activity

People

Assignee:: Unassigned

Reporter:: Tim Allison

Votes:: 0 Vote for this issue

Watchers:: 2 Start watching this issue

Dates

Created:: 25/Mar/13 12:15

Updated:: 28/Aug/22 13:42

Resolved:: 09/Apr/13 13:29