Uploaded image for project: 'Lucene - Core'
  1. Lucene - Core
  2. LUCENE-889

Standard tokenizer with punctuation output

    Details

    • Type: Improvement
    • Status: Closed
    • Priority: Trivial
    • Resolution: Won't Fix
    • Affects Version/s: 2.1
    • Fix Version/s: None
    • Component/s: None
    • Labels:
      None
    • Lucene Fields:
      New, Patch Available

      Description

      This patch adds punctuation (comma, period, question mark and exclamation point) tokens as output from the StandardTokenizer, and filters them out in the StandardFilter.

      (I needed them for text classification reasons.)

        Attachments

        1. standard.patch
          46 kB
          Karl Wettin
        2. test.patch
          1 kB
          Karl Wettin

          Activity

            People

            • Assignee:
              Unassigned
              Reporter:
              karl.wettin Karl Wettin
            • Votes:
              0 Vote for this issue
              Watchers:
              0 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: