Uploaded image for project: 'Lucene - Core'
  1. Lucene - Core
  2. LUCENE-8584

Japanese UserDictionary should remove duplicate entries

Details

    • Improvement
    • Status: Resolved
    • Minor
    • Resolution: Won't Fix
    • None
    • None
    • None
    • None
    • New

    Description

      The Japanese's UserDictionary in the kuromoji module fails to load the dictionary if the file contains duplicate entries:

      java.lang.UnsupportedOperationException
      	at __randomizedtesting.SeedInfo.seed([C340BE6DB5DF33E8:A804576E05DF86DF]:0)
      	at org.apache.lucene.util.fst.Outputs.merge(Outputs.java:97)
      	at org.apache.lucene.util.fst.Builder.add(Builder.java:445)
      	at org.apache.lucene.analysis.ja.dict.UserDictionary.<init>(UserDictionary.java:135)
      	at org.apache.lucene.analysis.ja.dict.UserDictionary.open(UserDictionary.java:81)
      	at org.apache.lucene.analysis.ja.TestJapaneseTokenizer.readDict(TestJapaneseTokenizer.java:55)
      	at org.apache.lucene.analysis.ja.dict.UserDictionaryTest.testLookup(UserDictionaryTest.java:30)
      	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
      

      Duplicate entries should be ignored or a more descriptive error should be thrown.

      Attachments

        1. LUCENE-8584.patch
          3 kB
          Jim Ferenczi

        Activity

          People

            Unassigned Unassigned
            jim.ferenczi Jim Ferenczi
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Slack

                Issue deployment