[LUCENE-7760] StandardAnalyzer/Tokenizer.setMaxTokenLength's javadocs are lying - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Closed
Priority: Major
Resolution: Fixed
Affects Version/s: None
Fix Version/s: 6.6, 7.0
Component/s: None
Labels:
None

Lucene Fields:

New

Description

The javadocs claim that too-long tokens are discarded, but in fact they are simply chopped up. The following test case unexpectedly passes:

  public void testMaxTokenLengthNonDefault() throws Exception {
    StandardAnalyzer a = new StandardAnalyzer();
    a.setMaxTokenLength(5);
    assertAnalyzesTo(a, "ab cd toolong xy z", new String[]{"ab", "cd", "toolo", "ng", "xy", "z"});
    a.close();
  }

We should at least fix the javadocs ...

(I hit this because I was trying to also add setMaxTokenLength to EnglishAnalyzer).

Attachments

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending

LUCENE-7760.patch
02/Apr/17 19:54
4 kB
Michael McCandless
LUCENE-7760.patch
03/Apr/17 18:41
10 kB
Michael McCandless

Activity

People

Assignee:: Michael McCandless

Reporter:: Michael McCandless

Votes:: 0 Vote for this issue

Watchers:: 3 Start watching this issue

Dates

Created:: 30/Mar/17 09:58

Updated:: 28/Aug/22 15:12

Resolved:: 11/Apr/17 19:41