Details
-
Bug
-
Status: Open
-
Major
-
Resolution: Unresolved
-
None
-
None
-
New
Description
Error seen:
2> TEST FAIL: useCharFilter=true text='Uf?F ?wlu{0 <!--'a' 2> Exception from random analyzer: 2> charfilters= 2> tokenizer= 2> org.apache.lucene.analysis.ja.JapaneseTokenizer(org.apache.lucene.util.AttributeFactory$1@4c00d592, null, false, true, NORMAL) 2> filters= 2> Conditional:org.apache.lucene.analysis.pt.PortugueseLightStemFilter(OneTimeWrapper@3fad923e term=,bytes=[],startOffset=0,endOffset=0,positionIncrement=1,positionLength=1,type=word,termFrequency=1,flags=0,payload=null,baseForm=null,partOfSpeech=null,partOfSpeech (en)=null,reading=null,reading (en)=null,pronunciation=null,pronunciation (en)=null,inflectionType=null,inflectionType (en)=null,inflectionForm=null,inflectionForm (en)=null,keyword=false) 2> org.apache.lucene.analysis.phonetic.BeiderMorseFilter(ValidatingTokenFilter@43fbbeb0 term=,bytes=[],startOffset=0,endOffset=0,positionIncrement=1,positionLength=1,type=word,termFrequency=1,flags=0,payload=null,baseForm=null,partOfSpeech=null,partOfSpeech (en)=null,reading=null,reading (en)=null,pronunciation=null,pronunciation (en)=null,inflectionType=null,inflectionType (en)=null,inflectionForm=null,inflectionForm (en)=null,keyword=false, org.apache.commons.codec.language.bm.PhoneticEngine@631e916d) 2> Conditional:org.apache.lucene.analysis.synonym.SynonymGraphFilter(OneTimeWrapper@77051976 term=,bytes=[],startOffset=0,endOffset=0,positionIncrement=1,positionLength=1,type=word,termFrequency=1,flags=0,payload=null,baseForm=null,partOfSpeech=null,partOfSpeech (en)=null,reading=null,reading (en)=null,pronunciation=null,pronunciation (en)=null,inflectionType=null,inflectionType (en)=null,inflectionForm=null,inflectionForm (en)=null,keyword=false, org.apache.lucene.analysis.synonym.SynonymMap@69152718, true) > java.lang.ArrayIndexOutOfBoundsException: Index -1 out of bounds for length 0 > at __randomizedtesting.SeedInfo.seed([1E22B4EE8663AE48:23C39D8FC171B388]:0) > at org.apache.commons.codec@1.13/org.apache.commons.codec.language.bm.PhoneticEngine.encode(PhoneticEngine.java:433) > at org.apache.commons.codec@1.13/org.apache.commons.codec.language.bm.PhoneticEngine.encode(PhoneticEngine.java:384) > at org.apache.lucene.analysis.phonetic@10.0.0-SNAPSHOT/org.apache.lucene.analysis.phonetic.BeiderMorseFilter.incrementToken(BeiderMorseFilter.java:96)
Actually the issue happens if:
- PhoneticEngine uses NameType=SEPHARDIC
- The term is empty or the cleanup done by the encode is empty (whitespace and dashes removed)
The problem is that the encoder calls String.split() and assumes the array always has size>=1.
You can write an easy test, but the bug has to be reported upstream.