[LUCENE-3366] StandardFilter only works with ClassicTokenizer and only when version < 3.1 - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Improvement
Status: Resolved
Priority: Major
Resolution: Not A Problem
Affects Version/s: 3.3
Fix Version/s: None
Component/s: modules/analysis
Labels:
None

Lucene Fields:

New

Description

The StandardFilter used to remove periods from acronyms and apostrophes-S's where they occurred. And it used to work in conjunction with the StandardTokenizer. Presently, it only does this with ClassicTokenizer and when the lucene match version is before 3.1. Here is a excerpt from the code:

  public final boolean incrementToken() throws IOException {
    if (matchVersion.onOrAfter(Version.LUCENE_31))
      return input.incrementToken(); // TODO: add some niceties for the new grammar
    else
      return incrementTokenClassic();
  }

It seems to me that in the great refactor of the standard tokenizer, ~~LUCENE-2167~~, something was forgotten here. I think that if someone uses the ClassicTokenizer then no matter what the version is, this filter should do what it used to do. And the TODO suggests someone forgot to make this filter do something useful for the StandardTokenizer. Or perhaps that idea should be discarded and this class should be named ClassicTokenFilter.

In any event, the javadocs for this class appear out of date as there is no mention of ClassicTokenizer, and the wiki is out of date too.

Attachments

Activity

People

Assignee:: Unassigned

Reporter:: David Smiley

Votes:: 0 Vote for this issue

Watchers:: 0 Start watching this issue

Dates

Created:: 08/Aug/11 22:17

Updated:: 28/Aug/22 12:55

Resolved:: 09/Aug/11 03:02