Lucene - Core
  1. Lucene - Core
  2. LUCENE-889

Standard tokenizer with punctuation output

    Details

    • Type: Improvement Improvement
    • Status: Closed
    • Priority: Trivial Trivial
    • Resolution: Won't Fix
    • Affects Version/s: 2.1
    • Fix Version/s: None
    • Component/s: None
    • Labels:
      None
    • Lucene Fields:
      New, Patch Available

      Description

      This patch adds punctuation (comma, period, question mark and exclamation point) tokens as output from the StandardTokenizer, and filters them out in the StandardFilter.

      (I needed them for text classification reasons.)

      1. standard.patch
        46 kB
        Karl Wettin
      2. test.patch
        1 kB
        Karl Wettin

        Activity

        Karl Wettin created issue -
        Karl Wettin made changes -
        Field Original Value New Value
        Attachment standard.patch [ 12358216 ]
        Attachment test.patch [ 12358217 ]
        Karl Wettin made changes -
        Status Open [ 1 ] Closed [ 6 ]
        Lucene Fields [Patch Available, New] [New, Patch Available]
        Resolution Won't Fix [ 2 ]
        Mark Thomas made changes -
        Workflow jira [ 12404881 ] Default workflow, editable Closed status [ 12562876 ]
        Mark Thomas made changes -
        Workflow Default workflow, editable Closed status [ 12562876 ] jira [ 12583772 ]

          People

          • Assignee:
            Unassigned
            Reporter:
            Karl Wettin
          • Votes:
            0 Vote for this issue
            Watchers:
            0 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development