Uploaded image for project: 'Lucene - Core'
  1. Lucene - Core
  2. LUCENE-889

Standard tokenizer with punctuation output

Details

    • Improvement
    • Status: Closed
    • Trivial
    • Resolution: Won't Fix
    • 2.1
    • None
    • None
    • None
    • New, Patch Available

    Description

      This patch adds punctuation (comma, period, question mark and exclamation point) tokens as output from the StandardTokenizer, and filters them out in the StandardFilter.

      (I needed them for text classification reasons.)

      Attachments

        1. test.patch
          1 kB
          Karl Wettin
        2. standard.patch
          46 kB
          Karl Wettin

        Activity

          People

            Unassigned Unassigned
            karl.wettin Karl Wettin
            Votes:
            0 Vote for this issue
            Watchers:
            0 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: