Uploaded image for project: 'Lucene - Core'
  1. Lucene - Core
  2. LUCENE-7705

Allow CharTokenizer-derived tokenizers and KeywordTokenizer to configure the max token length

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Resolved
    • Minor
    • Resolution: Fixed
    • None
    • 6.7, 7.0
    • None
    • None
    • New

    Description

      SOLR-10186

      erickerickson: Is there a good reason that we hard-code a 256 character limit for the CharTokenizer? In order to change this limit it requires that people copy/paste the incrementToken into some new class since incrementToken is final.
      KeywordTokenizer can easily change the default (which is also 256 bytes), but to do so requires code rather than being able to configure it in the schema.
      For KeywordTokenizer, this is Solr-only. For the CharTokenizer classes (WhitespaceTokenizer, UnicodeWhitespaceTokenizer and LetterTokenizer) (Factories) it would take adding a c'tor to the base class in Lucene and using it in the factory.
      Any objections?

      Attachments

        1. LUCENE-7705.patch
          26 kB
          Erick Erickson
        2. LUCENE-7705.patch
          26 kB
          Erick Erickson
        3. LUCENE-7705.patch
          27 kB
          Amrit Sarkar
        4. LUCENE-7705.patch
          38 kB
          Amrit Sarkar
        5. LUCENE-7705.patch
          30 kB
          Erick Erickson
        6. LUCENE-7705.patch
          42 kB
          Amrit Sarkar
        7. LUCENE-7705.patch
          35 kB
          Erick Erickson
        8. LUCENE-7705.patch
          49 kB
          Erick Erickson
        9. LUCENE-7705.patch
          49 kB
          Erick Erickson
        10. LUCENE-7705.patch
          4 kB
          Amrit Sarkar
        11. LUCENE-7705
          72 kB
          Amrit Sarkar

        Issue Links

          Activity

            People

              erickerickson Erick Erickson
              sarkaramrit2@gmail.com Amrit Sarkar
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: