[LUCENE-889] Standard tokenizer with punctuation output - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Improvement
Status: Closed
Priority: Trivial
Resolution: Won't Fix
Affects Version/s: 2.1
Fix Version/s: None
Component/s: None
Labels:
None

Lucene Fields:

New, Patch Available

Description

This patch adds punctuation (comma, period, question mark and exclamation point) tokens as output from the StandardTokenizer, and filters them out in the StandardFilter.

(I needed them for text classification reasons.)

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending

standard.patch
25/May/07 10:39
46 kB
Karl Wettin
test.patch
25/May/07 10:39
1 kB
Karl Wettin

Activity

People

Assignee:: Unassigned

Reporter:: Karl Wettin

Votes:: 0 Vote for this issue

Watchers:: 0 Start watching this issue

Dates

Created:: 25/May/07 10:37

Updated:: 28/Aug/22 11:37

Resolved:: 12/Apr/08 18:17