Uploaded image for project: 'Lucene - Core'
  1. Lucene - Core
  2. LUCENE-8933

JapaneseTokenizer creates Token objects with corrupt offsets

    XMLWordPrintableJSON

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Minor
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: main (9.0), 8.3
    • Component/s: None
    • Labels:
      None
    • Lucene Fields:
      New

      Description

      An Elasticsearch user reported the following stack trace when parsing synonyms. It looks like the only reason why this might occur is if the offset of a org.apache.lucene.analysis.ja.Token is not within the expected range.

       

      Caused by: java.lang.ArrayIndexOutOfBoundsException
          at org.apache.lucene.analysis.tokenattributes.CharTermAttributeImpl.copyBuffer(CharTermAttributeImpl.java:44) ~[lucene-core-7.6.0.jar:7.6.0 719cde97f84640faa1e3525690d262946571245f - nknize - 2018-12-07 14:44:20]
          at org.apache.lucene.analysis.ja.JapaneseTokenizer.incrementToken(JapaneseTokenizer.java:486) ~[?:?]
          at org.apache.lucene.analysis.synonym.SynonymMap$Parser.analyze(SynonymMap.java:318) ~[lucene-analyzers-common-7.6.0.jar:7.6.0 719cde97f84640faa1e3525690d262946571245f - nknize - 2018-12-07 14:44:48]
          at org.elasticsearch.index.analysis.ESSolrSynonymParser.analyze(ESSolrSynonymParser.java:57) ~[elasticsearch-6.6.1.jar:6.6.1]
          at org.apache.lucene.analysis.synonym.SolrSynonymParser.addInternal(SolrSynonymParser.java:114) ~[lucene-analyzers-common-7.6.0.jar:7.6.0 719cde97f84640faa1e3525690d262946571245f - nknize - 2018-12-07 14:44:48]
          at org.apache.lucene.analysis.synonym.SolrSynonymParser.parse(SolrSynonymParser.java:70) ~[lucene-analyzers-common-7.6.0.jar:7.6.0 719cde97f84640faa1e3525690d262946571245f - nknize - 2018-12-07 14:44:48]
          at org.elasticsearch.index.analysis.SynonymTokenFilterFactory.buildSynonyms(SynonymTokenFilterFactory.java:154) ~[elasticsearch-6.6.1.jar:6.6.1]
          ... 24 more
      

        Attachments

          Activity

            People

            • Assignee:
              tomoko Tomoko Uchida
              Reporter:
              jpountz Adrien Grand
            • Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Time Tracking

                Estimated:
                Original Estimate - Not Specified
                Not Specified
                Remaining:
                Remaining Estimate - 0h
                0h
                Logged:
                Time Spent - 2h 50m
                2h 50m