[SOLR-10186] Allow CharTokenizer-derived tokenizers and KeywordTokenizer to configure the max token length - ASF JIRA

Agile Board

Attach files

Attach Screenshot

Voters

Watch issue

Watchers

Create sub-task

Link

Clone

Update Comment Author

Replace String in Comment

Update Comment Visibility

Delete Comments

XML

Word

Printable

JSON

Details

Type: Improvement
Status: Resolved
Priority: Minor
Resolution: Duplicate
Affects Version/s: None
Fix Version/s: None
Component/s: None
Labels:
None

Description

Is there a good reason that we hard-code a 256 character limit for the CharTokenizer? In order to change this limit it requires that people copy/paste the incrementToken into some new class since incrementToken is final.

KeywordTokenizer can easily change the default (which is also 256 bytes), but to do so requires code rather than being able to configure it in the schema.

For KeywordTokenizer, this is Solr-only. For the CharTokenizer classes (WhitespaceTokenizer, UnicodeWhitespaceTokenizer and LetterTokenizer) (Factories) it would take adding a c'tor to the base class in Lucene and using it in the factory.

Any objections?

Attachments

SOLR-10186.patch
23/Feb/17 12:42
24 kB
Amrit Sarkar
SOLR-10186.patch
23/Feb/17 11:45
14 kB
Amrit Sarkar
SOLR-10186.patch
22/Feb/17 19:23
9 kB
Amrit Sarkar

Issue Links

Add Link

requires

LUCENE-7705 Allow CharTokenizer-derived tokenizers and KeywordTokenizer to configure the max token length

Resolved

Delete this link

Activity

Comment

This comment will be Viewable by All Users Viewable by All Users

Cancel

People

Assignee:: Erick Erickson

Reporter:: Erick Erickson

Votes:: 0 Vote for this issue

Watchers:: 3 Start watching this issue

Dates

Created:: 22/Feb/17 04:19

Updated:: 08/Jun/19 15:33

Resolved:: 06/Mar/17 18:12

Agile

View on Board

Allow CharTokenizer-derived tokenizers and KeywordTokenizer to configure the max token length

Details

Description

Attachments

Attachments

Issue Links

Activity

People

Dates

Agile

Slack

Issue deployment