Details
-
Bug
-
Status: Open
-
Major
-
Resolution: Unresolved
-
None
-
None
-
None
-
New
Description
It is a tokenfilter, tries to change offsets, so of course TestRandomChains finds bugs in it:
NOTE: reproduce with: gradlew test --tests TestRandomChains.testRandomChainsWithLargeStrings -Dtests.seed=E233A5FAC016E02 -Dtests.nightly=true -Dtests.slow=true -Dtests.locale=en-TV -Dtests.timezone=Asia/Saigon -Dtests.asserts=true -Dtests.file.encoding=UTF-8
org.apache.lucene.analysis.tests.TestRandomChains > test suite's output saved to /home/rmuir/workspace/lucene/lucene/analysis/integration.tests/build/test-results/test_54/outputs/OUTPUT-org.apache.lucene.analysis.tests.TestRandomChains.txt, copied below: 2> stage 0: lk<[1-3] +1> p<[6-7] +1> ngtoixtmldzsjz<[10-24] +1> uoq<[25-28] +1> HANGUL<[28-28] +1> o<[29-30] +1> HANGUL<[31-31] +1> VulliPHsZzn<[32-43] +1> 2> stage 1: lk<[1-3] +1> 850000<[1-3] +0> p<[6-7] +1> 700000<[6-7] +0> ngtoixtmldzsjz<[10-24] +1> 653543<[10-24] +0> uoq<[25-28] +1> 050000<[25-28] +0> HANGUL<[28-28] +1> 565800<[28-28] +0> o<[29-30] +1> 000000<[29-30] +0> HANGUL<[31-31] +1> 565800<[31-31] +0> VulliPHsZzn<[32-43] +1> 787460<[32-43] +0> 2> stage 2: ngtoixtmldzsjz 653543<[10-24] +0> 653543<[10-24] +1> 653543 uoq<[10-28] +0> uoq<[25-28] +1> uoq 050000<[25-28] +0> 050000<[25-28] +1> 050000 HANGUL<[25-28] +0> HANGUL<[28-28] +1> HANGUL 565800<[28-28] +0> 565800<[28-28] +1> 565800 o<[28-30] +0> o<[29-30] +1> o 000000<[29-30] +0> 000000<[29-30] +1> 000000 HANGUL<[29-31] +0> HANGUL<[31-31] +1> HANGUL 565800<[31-31] +0> 565800<[31-31] +1> 565800 VulliPHsZzn<[31-43] +0> VulliPHsZzn<[32-43] +1> 2> last stage: ngtoixtmldzsjz<[10-24] +1> ngtoixtmldzsjz 653543<[10-24] +0> 653543<[10-24] +1> 653543 uoq<[10-28] +0> uoq<[25-28] +1> uoq 050000<[25-28] +1> 050000<[25-28] +1> 050000 HANGUL<[25-28] +1> HANGUL<[28-28] +1> HANGUL 565800<[28-28] +0> 565800<[28-28] +1> 565800 o<[28-30] +0> o<[29-30] +1> o 000000<[29-30] +0> 000000<[29-30] +1> 000000 HANGUL<[29-31] +0> HANGUL<[31-31] +1> HANGUL 565800<[31-31] +1> 565800<[31-31] +1> 565800 VulliPHsZzn<[31-43] +0> 2> TEST FAIL: useCharFilter=true text='[lk[-.p|) ngtoixtmldzsjz uoqao aVulliPHsZzn wxsk' 2> Exception from random analyzer: 2> charfilters= 2> org.apache.lucene.analysis.pattern.PatternReplaceCharFilter(a, <HANGUL>, java.io.StringReader@5b3b54eb) 2> tokenizer= 2> org.apache.lucene.analysis.classic.ClassicTokenizer(org.apache.lucene.util.AttributeFactory$1@e29311e9) 2> filters= 2> org.apache.lucene.analysis.phonetic.DaitchMokotoffSoundexFilter(ValidatingTokenFilter@32a6de77 term=,bytes=[],startOffset=0,endOffset=0,positionIncrement=1,positionLength=1,type=word,termFrequency=1, true) 2> org.apache.lucene.analysis.shingle.ShingleFilter(ValidatingTokenFilter@3d044414 term=,bytes=[],startOffset=0,endOffset=0,positionIncrement=1,positionLength=1,type=word,termFrequency=1, q) 2> Conditional:org.apache.lucene.analysis.ja.JapaneseCompletionFilter(OneTimeWrapper@435207ec term=,bytes=[],startOffset=0,endOffset=0,positionIncrement=1,positionLength=1,type=word,termFrequency=1,reading=null,reading (en)=null,pronunciation=null,pronunciation (en)=null, INDEX) > java.lang.IllegalStateException: last stage: inconsistent endOffset at pos=19: 31 vs 43; token=565800 VulliPHsZzn