Uploaded image for project: 'Lucene - Core'
  1. Lucene - Core
  2. LUCENE-3834

The tokenstream create by SmartChineseAnalyzer can't reset

Details

    • Bug
    • Status: Open
    • Minor
    • Resolution: Unresolved
    • 3.5
    • None
    • modules/analysis
    • None
    • New

    Description

      That is because the field input in class SentenceTokenizer isn't reset after we call the method reset().

      They are two input field,one is from Tokenizer and another is from TokenFilter,if we need to reset a tokenstream created by SmartChineseAnalyzer, both them need reset.This bug is because of the author forget reset input field in class SentenceTokenizer .

      class path : org.apache.lucene.analysis.cn.smart.SentenceTokenizer

      oringal code

      public final class SentenceTokenizer extends Tokenizer {
      ....
      @Override
      public void reset() throws IOException

      { super.reset(); tokenStart = tokenEnd = 0; }

      ...
      }

      this method should changes as follow

      public void reset() throws IOException

      { super.reset(); /*should reset input*/ if (input.markSupported()) input.reset(); tokenStart = tokenEnd = 0; }

      Attachments

        Activity

          People

            Unassigned Unassigned
            martin3000 dingjin
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

              Created:
              Updated: