Details
-
Bug
-
Status: Open
-
Major
-
Resolution: Unresolved
-
None
-
None
-
None
-
New
Description
It is a tokenfilter, tries to change offsets, so of course TestRandomChains finds bugs in it:
NOTE: reproduce with: gradlew test --tests TestRandomChains.testRandomChains -Dtests.seed=12BC606B774693E4 -Dtests.nightly=true -Dtests.slow=true -Dtests.locale=om-Latn-ET -Dtests.timezone=Australia/Yancowinna -Dtests.asserts=true -Dtests.file.encoding=UTF-8
org.apache.lucene.analysis.tests.TestRandomChains > test suite's output saved to /home/rmuir/workspace/lucene/lucene/analysis/integration.tests/build/test-results/test_16/outputs/OUTPUT-org.apache.lucene.analysis.tests.TestRandomChains.txt, copied below: 2> stage 0: 뱅<[0-1] +1> Ƒ<[1-2] +1> ė<[3-4] +1> 履<[6-7] +1> jEqyzUT<[8-15] +1> 2> stage 1: 000000<[0-1] +1> Ƒ<[1-2] +1> ė<[3-4] +1> 000000<[6-7] +1> 154300<[8-15] +1> 454300<[8-15] +0> 2> last stage: 0<[0-1] +1> Ƒ<[1-2] +1> ė<[3-4] +1> 000000<[6-7] +1> 454300<[8-15] +0> 2> TEST FAIL: useCharFilter=false text='\ubc45\u0191(\u0117\ud8ad\udf0a\uf9df jEqyzUT ' 2> Exception from random analyzer: 2> charfilters= 2> org.apache.lucene.analysis.cjk.CJKWidthCharFilter(java.io.StringReader@17af5384) 2> org.apache.lucene.analysis.charfilter.MappingCharFilter(org.apache.lucene.analysis.charfilter.NormalizeCharMap@33e5bdbb, org.apache.lucene.analysis.cjk.CJKWidthCharFilter@1aafd271) 2> tokenizer= 2> org.apache.lucene.analysis.icu.segmentation.ICUTokenizer(org.apache.lucene.analysis.icu.segmentation.DefaultICUTokenizerConfig@4e6f4690) 2> filters= 2> Conditional:org.apache.lucene.analysis.phonetic.DaitchMokotoffSoundexFilter(OneTimeWrapper@34215eb7 term=,bytes=[],startOffset=0,endOffset=0,positionIncrement=1,positionLength=1,type=word,termFrequency=1,script=Common, false) 2> org.apache.lucene.analysis.ko.KoreanNumberFilter(ValidatingTokenFilter@7b4a2a5b term=,bytes=[],startOffset=0,endOffset=0,positionIncrement=1,positionLength=1,type=word,termFrequency=1,script=Common,keyword=false) > java.lang.IllegalStateException: last stage: inconsistent startOffset at pos=3: 6 vs 8; token=454300 > at __randomizedtesting.SeedInfo.seed([12BC606B774693E4:2F5D490A30548E24]:0) > at org.apache.lucene.test_framework@10.0.0-SNAPSHOT/org.apache.lucene.tests.analysis.ValidatingTokenFilter.incrementToken(ValidatingTokenFilter.java:138) > at org.apache.lucene.test_framework@10.0.0-SNAPSHOT/org.apache.lucene.tests.analysis.BaseTokenStreamTestCase.checkAnalysisConsistency(BaseTokenStreamTestCase.java:1130) > at org.apache.lucene.test_framework@10.0.0-SNAPSHOT/org.apache.lucene.tests.analysis.BaseTokenStreamTestCase.checkRandomData(BaseTokenStreamTestCase.java:1028) > at org.apache.lucene.test_framework@10.0.0-SNAPSHOT/org.apache.lucene.tests.analysis.BaseTokenStreamTestCase.checkRandomData(BaseTokenStreamTestCase.java:922) > at org.apache.lucene.analysis.tests@10.0.0-SNAPSHOT/org.apache.lucene.analysis.tests.TestRandomChains.testRandomChains(TestRandomChains.java:915)