Lucene - Core
  1. Lucene - Core
  2. LUCENE-889

Standard tokenizer with punctuation output

    Details

    • Type: Improvement Improvement
    • Status: Closed
    • Priority: Trivial Trivial
    • Resolution: Won't Fix
    • Affects Version/s: 2.1
    • Fix Version/s: None
    • Component/s: None
    • Labels:
      None
    • Lucene Fields:
      New, Patch Available

      Description

      This patch adds punctuation (comma, period, question mark and exclamation point) tokens as output from the StandardTokenizer, and filters them out in the StandardFilter.

      (I needed them for text classification reasons.)

      1. standard.patch
        46 kB
        Karl Wettin
      2. test.patch
        1 kB
        Karl Wettin

        Activity

          People

          • Assignee:
            Unassigned
            Reporter:
            Karl Wettin
          • Votes:
            0 Vote for this issue
            Watchers:
            0 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development