[LUCENE-2014] position increment bug: smartcn - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Closed
Priority: Major
Resolution: Fixed
Affects Version/s: None
Fix Version/s: 3.0
Component/s: modules/analysis
Labels:
None

Lucene Fields:

New, Patch Available

Description

If i use LUCENE_VERSION >= 2.9 with smart chinese analyzer, it will crash indexwriter with any reasonable amount of chinese text.

its especially annoying because it happens in 2.9.1 RC as well.

this is because the position increments for tokens after stopwords are bogus:

Here's an example (from test case), where the position increment should be 2, but is instead 91975314!

  public void testChineseStopWords2() throws Exception {
    Analyzer ca = new SmartChineseAnalyzer(Version.LUCENE_CURRENT); /* will load stopwords */
    String sentence = "Title:San"; // : is a stopword
    String result[] = { "titl", "san"};
    int startOffsets[] = { 0, 6 };
    int endOffsets[] = { 5, 9 };
    int posIncr[] = { 1, 2 };
    assertAnalyzesTo(ca, sentence, result, startOffsets, endOffsets, posIncr);
  }

junit.framework.AssertionFailedError: posIncrement 1 expected:<2> but was:<91975314>
at junit.framework.Assert.fail(Assert.java:47)
at junit.framework.Assert.failNotEquals(Assert.java:280)
at junit.framework.Assert.assertEquals(Assert.java:64)
at junit.framework.Assert.assertEquals(Assert.java:198)
at org.apache.lucene.analysis.BaseTokenStreamTestCase.assertTokenStreamContents(BaseTokenStreamTestCase.java:83)
...

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending

LUCENE-2014.patch
29/Oct/09 08:04
1 kB
Robert Muir
LUCENE-2014.patch
29/Oct/09 08:33
2 kB
Robert Muir
LUCENE-2014_branch.patch
29/Oct/09 09:36
3 kB
Robert Muir

Activity

People

Assignee:: Robert Muir

Reporter:: Robert Muir

Votes:: 0 Vote for this issue

Watchers:: 0 Start watching this issue

Dates

Created:: 29/Oct/09 08:03

Updated:: 28/Aug/22 12:12

Resolved:: 29/Oct/09 09:58