Uploaded image for project: 'Lucene - Core'
  1. Lucene - Core
  2. LUCENE-8676

TestKoreanTokenizer#testRandomHugeStrings failure

    XMLWordPrintableJSON

    Details

    • Type: Bug
    • Status: Closed
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 7.7, 8.0
    • Component/s: None
    • Labels:
      None
    • Lucene Fields:
      New

      Description

      KoreanTokenizer#testRandomHugeString failed in CI with the following exception:

        [junit4]    > Throwable #1: java.lang.AssertionError
         [junit4]    >        at __randomizedtesting.SeedInfo.seed([8C5E2BE10F581CB:90E6857D4E833D83]:0)
         [junit4]    >        at org.apache.lucene.analysis.ko.KoreanTokenizer.add(KoreanTokenizer.java:334)
         [junit4]    >        at org.apache.lucene.analysis.ko.KoreanTokenizer.parse(KoreanTokenizer.java:707)
         [junit4]    >        at org.apache.lucene.analysis.ko.KoreanTokenizer.incrementToken(KoreanTokenizer.java:377)
         [junit4]    >        at org.apache.lucene.analysis.BaseTokenStreamTestCase.checkAnalysisConsistency(BaseTokenStreamTestCase.java:748)
         [junit4]    >        at org.apache.lucene.analysis.BaseTokenStreamTestCase.checkRandomData(BaseTokenStreamTestCase.java:659)
         [junit4]    >        at org.apache.lucene.analysis.BaseTokenStreamTestCase.checkRandomData(BaseTokenStreamTestCase.java:561)
         [junit4]    >        at org.apache.lucene.analysis.BaseTokenStreamTestCase.checkRandomData(BaseTokenStreamTestCase.java:474)
         [junit4]    >        at org.apache.lucene.analysis.ko.TestKoreanTokenizer.testRandomHugeStrings(TestKoreanTokenizer.java:313)
         [junit4]    >        at java.lang.Thread.run(Thread.java:748)
         [junit4]   2> NOTE: leaving temporary files
      

      I am able to reproduce locally with:

      ant test  -Dtestcase=TestKoreanTokenizer -Dtests.method=testRandomHugeStrings -Dtests.seed=8C5E2BE10F581CB -Dtests.multiplier=2 -Dtests.nightly=true -Dtests.slow=true -Dtests.linedocsfile=/home/jenkins/jenkins-slave/workspace/Lucene-Solr-NightlyTests-7.7/test-data/enwiki.random.lines.txt -Dtests.locale=uk-UA -Dtests.timezone=Europe/Istanbul -Dtests.asserts=true -Dtests.file.encoding=ISO-8859-1
      

      After some investigation I found out that the position of the buffer is not updated when the maximum backtrace size is reached (1024).

        Attachments

        1. LUCENE-8676.patch
          1 kB
          Jim Ferenczi

          Activity

            People

            • Assignee:
              Unassigned
              Reporter:
              jim.ferenczi Jim Ferenczi
            • Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: