Uploaded image for project: 'Lucene - Core'
  1. Lucene - Core
  2. LUCENE-7760

StandardAnalyzer/Tokenizer.setMaxTokenLength's javadocs are lying

Details

    • Bug
    • Status: Closed
    • Major
    • Resolution: Fixed
    • None
    • 6.6, 7.0
    • None
    • None
    • New

    Description

      The javadocs claim that too-long tokens are discarded, but in fact they are simply chopped up. The following test case unexpectedly passes:

        public void testMaxTokenLengthNonDefault() throws Exception {
          StandardAnalyzer a = new StandardAnalyzer();
          a.setMaxTokenLength(5);
          assertAnalyzesTo(a, "ab cd toolong xy z", new String[]{"ab", "cd", "toolo", "ng", "xy", "z"});
          a.close();
        }
      

      We should at least fix the javadocs ...

      (I hit this because I was trying to also add setMaxTokenLength to EnglishAnalyzer).

      Attachments

        1. LUCENE-7760.patch
          4 kB
          Michael McCandless
        2. LUCENE-7760.patch
          10 kB
          Michael McCandless

        Activity

          People

            mikemccand Michael McCandless
            mikemccand Michael McCandless
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: