Uploaded image for project: 'Solr'
  1. Solr
  2. SOLR-3524

Make discard-punctuation feature in Kuromoji configurable from JapaneseTokenizerFactory

    Details

    • Type: Improvement
    • Status: Closed
    • Priority: Minor
    • Resolution: Fixed
    • Affects Version/s: 3.6
    • Fix Version/s: 4.0-BETA, 6.0
    • Component/s: Schema and Analysis
    • Labels:
      None

      Description

      JapaneseTokenizer, Kuromoji doesn't provide configuration option to preserve punctuation in Japanese text, although It has a parameter to change this behavior. JapaneseTokenizerFactory always set third parameter, which controls this behavior, to true to remove punctuation.
      I would like to have an option I can configure this behavior by fieldtype definition in schema.xml.

        Attachments

        1. SOLR-3524.patch
          5 kB
          Christian Moen
        2. SOLR-3524.patch
          5 kB
          Christian Moen
        3. kuromoji_discard_punctuation.patch.txt
          1 kB
          Jun Ohtani

          Activity

            People

            • Assignee:
              cm Christian Moen
              Reporter:
              h.kazuaki Kazuaki Hiraga
            • Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: