Uploaded image for project: 'Lucene - Core'
  1. Lucene - Core
  2. LUCENE-8784

Nori(Korean) tokenizer removes the decimal point.

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Closed
    • Major
    • Resolution: Fixed
    • None
    • 9.0, 8.2
    • None
    • None
    • New

    Description

      This is the same issue that I mentioned to https://github.com/elastic/elasticsearch/issues/41401#event-2293189367

      unlike standard analyzer, nori analyzer removes the decimal point.

      nori tokenizer removes "." character by default.
      In this case, it is difficult to index the keywords including the decimal point.

      It would be nice if there had the option whether add a decimal point or not.

      Like Japanese tokenizer does,  Nori need an option to preserve decimal point.

       

      Attachments

        1. LUCENE-8784.patch
          14 kB
          Namgyu Kim
        2. LUCENE-8784.patch
          18 kB
          Jim Ferenczi
        3. LUCENE-8784.patch
          59 kB
          Namgyu Kim
        4. LUCENE-8784.patch
          18 kB
          Namgyu Kim

        Issue Links

          Activity

            People

              Unassigned Unassigned
              Munkyu Munkyu Im
              Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: