Uploaded image for project: 'Solr'
  1. Solr
  2. SOLR-3524

Make discard-punctuation feature in Kuromoji configurable from JapaneseTokenizerFactory

Attach filesAttach ScreenshotVotersWatch issueWatchersCreate sub-taskLinkCloneUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Closed
    • Minor
    • Resolution: Fixed
    • 3.6
    • 4.0-BETA, 6.0
    • Schema and Analysis
    • None

    Description

      JapaneseTokenizer, Kuromoji doesn't provide configuration option to preserve punctuation in Japanese text, although It has a parameter to change this behavior. JapaneseTokenizerFactory always set third parameter, which controls this behavior, to true to remove punctuation.
      I would like to have an option I can configure this behavior by fieldtype definition in schema.xml.

      Attachments

        1. SOLR-3524.patch
          5 kB
          Christian Moen
        2. SOLR-3524.patch
          5 kB
          Christian Moen
        3. kuromoji_discard_punctuation.patch.txt
          1 kB
          Jun Ohtani

        Activity

          This comment will be Viewable by All Users Viewable by All Users
          Cancel

          People

            cm Christian Moen
            h.kazuaki Kazuaki Hiraga
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Slack

                Issue deployment