Uploaded image for project: 'Lucene - Core'
  1. Lucene - Core
  2. LUCENE-8784

Nori(Korean) tokenizer removes the decimal point.

    XMLWordPrintableJSON

    Details

    • Type: Improvement
    • Status: Closed
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: master (9.0), 8.2
    • Component/s: None
    • Labels:
      None
    • Lucene Fields:
      New

      Description

      This is the same issue that I mentioned to https://github.com/elastic/elasticsearch/issues/41401#event-2293189367

      unlike standard analyzer, nori analyzer removes the decimal point.

      nori tokenizer removes "." character by default.
      In this case, it is difficult to index the keywords including the decimal point.

      It would be nice if there had the option whether add a decimal point or not.

      Like Japanese tokenizer does,  Nori need an option to preserve decimal point.

       

        Attachments

        1. LUCENE-8784.patch
          14 kB
          Namgyu Kim
        2. LUCENE-8784.patch
          18 kB
          Jim Ferenczi
        3. LUCENE-8784.patch
          59 kB
          Namgyu Kim
        4. LUCENE-8784.patch
          18 kB
          Namgyu Kim

          Issue Links

            Activity

              People

              • Assignee:
                Unassigned
                Reporter:
                Munkyu Munkyu Im
              • Votes:
                0 Vote for this issue
                Watchers:
                5 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: