Uploaded image for project: 'Hivemall'
  1. Hivemall
  2. HIVEMALL-208

tokenize_ja failed to analyze certain Japanese strings

    Details

    • Type: Bug
    • Status: Closed
    • Priority: Minor
    • Resolution: Fixed
    • Affects Version/s: 0.5.0
    • Fix Version/s: 0.5.2
    • Labels:
      None

      Description

      tokenize_ja failed to analyze certain Japanese strings and outputed below error.

      java.lang.ArrayIndexOutOfBoundsException: -1
      at org.apache.lucene.analysis.ja.JapaneseTokenizer.backtrace(JapaneseTokenizer.java:1024)
      at org.apache.lucene.analysis.ja.JapaneseTokenizer.parse(JapaneseTokenizer.java:873)
      at org.apache.lucene.analysis.ja.JapaneseTokenizer.incrementToken(JapaneseTokenizer.java:474)
      at org.apache.lucene.analysis.ja.JapaneseBaseFormFilter.incrementToken(JapaneseBaseFormFilter.java:50)
      at org.apache.lucene.analysis.util.FilteringTokenFilter.incrementToken(FilteringTokenFilter.java:51)
      at org.apache.lucene.analysis.cjk.CJKWidthFilter.incrementToken(CJKWidthFilter.java:63)
      at org.apache.lucene.analysis.util.FilteringTokenFilter.incrementToken(FilteringTokenFilter.java:51)
      at org.apache.lucene.analysis.ja.JapaneseKatakanaStemFilter.incrementToken(JapaneseKatakanaStemFilter.java:63)
      at org.apache.lucene.analysis.core.LowerCaseFilter.incrementToken(LowerCaseFilter.java:45)
      at hivemall.nlp.tokenizer.KuromojiUDF.analyzeTokens(KuromojiUDF.java:292)
      at hivemall.nlp.tokenizer.KuromojiUDF.evaluate(KuromojiUDF.java:117)

      This cause is LUCENE-7279 which has already fixed. Lucene need to be upgraded.
      Affected versions are not only v0.5.0 but also v0.4.2.

       

        Attachments

          Activity

            People

            • Assignee:
              myui Makoto Yui
              Reporter:
              iijima_satoshi Satoshi Iijima
            • Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: