Uploaded image for project: 'Lucene - Core'
  1. Lucene - Core
  2. LUCENE-8933

JapaneseTokenizer creates Token objects with corrupt offsets

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Minor
    • Resolution: Fixed
    • None
    • 9.0, 8.3
    • None
    • None
    • New

    Description

      An Elasticsearch user reported the following stack trace when parsing synonyms. It looks like the only reason why this might occur is if the offset of a org.apache.lucene.analysis.ja.Token is not within the expected range.

       

      Caused by: java.lang.ArrayIndexOutOfBoundsException
          at org.apache.lucene.analysis.tokenattributes.CharTermAttributeImpl.copyBuffer(CharTermAttributeImpl.java:44) ~[lucene-core-7.6.0.jar:7.6.0 719cde97f84640faa1e3525690d262946571245f - nknize - 2018-12-07 14:44:20]
          at org.apache.lucene.analysis.ja.JapaneseTokenizer.incrementToken(JapaneseTokenizer.java:486) ~[?:?]
          at org.apache.lucene.analysis.synonym.SynonymMap$Parser.analyze(SynonymMap.java:318) ~[lucene-analyzers-common-7.6.0.jar:7.6.0 719cde97f84640faa1e3525690d262946571245f - nknize - 2018-12-07 14:44:48]
          at org.elasticsearch.index.analysis.ESSolrSynonymParser.analyze(ESSolrSynonymParser.java:57) ~[elasticsearch-6.6.1.jar:6.6.1]
          at org.apache.lucene.analysis.synonym.SolrSynonymParser.addInternal(SolrSynonymParser.java:114) ~[lucene-analyzers-common-7.6.0.jar:7.6.0 719cde97f84640faa1e3525690d262946571245f - nknize - 2018-12-07 14:44:48]
          at org.apache.lucene.analysis.synonym.SolrSynonymParser.parse(SolrSynonymParser.java:70) ~[lucene-analyzers-common-7.6.0.jar:7.6.0 719cde97f84640faa1e3525690d262946571245f - nknize - 2018-12-07 14:44:48]
          at org.elasticsearch.index.analysis.SynonymTokenFilterFactory.buildSynonyms(SynonymTokenFilterFactory.java:154) ~[elasticsearch-6.6.1.jar:6.6.1]
          ... 24 more
      

      Attachments

        Activity

          People

            tomoko Tomoko Uchida
            jpountz Adrien Grand
            Votes:
            0 Vote for this issue
            Watchers:
            5 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Time Tracking

                Estimated:
                Original Estimate - Not Specified
                Not Specified
                Remaining:
                Remaining Estimate - 0h
                0h
                Logged:
                Time Spent - 2h 50m
                2h 50m