Uploaded image for project: 'Lucene - Core'
  1. Lucene - Core
  2. LUCENE-7442

MinHashFilter's ctor should validate its args

    Details

    • Type: Bug
    • Status: Closed
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 6.2.1
    • Component/s: modules/analysis
    • Labels:
      None
    • Lucene Fields:
      New

      Description

      My Jenkins found this reproducing branch_6x seed:

         [junit4] Suite: org.apache.lucene.analysis.core.TestRandomChains
         [junit4]   2> Exception from random analyzer: 
         [junit4]   2> charfilters=
         [junit4]   2> tokenizer=
         [junit4]   2>   org.apache.lucene.analysis.standard.StandardTokenizer()
         [junit4]   2> filters=
         [junit4]   2>   org.apache.lucene.analysis.minhash.MinHashFilter(ValidatingTokenFilter@6ae99167 term=,bytes=[],startOffset=0,endOffset=0,positionIncrement=1,positionLength=1,type=word, 5, 5, -3, true)
         [junit4]   2>   org.apache.lucene.analysis.bg.BulgarianStemFilter(ValidatingTokenFilter@40844352 term=,bytes=[],startOffset=0,endOffset=0,positionIncrement=1,positionLength=1,type=word,keyword=false)
         [junit4]   2> offsetsAreCorrect=true
         [junit4]   2> NOTE: reproduce with: ant test  -Dtestcase=TestRandomChains -Dtests.method=testRandomChainsWithLargeStrings -Dtests.seed=4733E677EBDC28FC -Dtests.slow=true -Dtests.locale=ar-OM -Dtests.timezone=Atlantic/South_Georgia -Dtests.asserts=true -Dtests.file.encoding=UTF-8
         [junit4] ERROR   3.18s J4 | TestRandomChains.testRandomChainsWithLargeStrings <<<
         [junit4]    > Throwable #1: java.util.NoSuchElementException
         [junit4]    > 	at __randomizedtesting.SeedInfo.seed([4733E677EBDC28FC:2D685966B292080F]:0)
         [junit4]    > 	at java.util.TreeMap.key(TreeMap.java:1323)
         [junit4]    > 	at java.util.TreeMap.lastKey(TreeMap.java:297)
         [junit4]    > 	at java.util.TreeSet.last(TreeSet.java:401)
         [junit4]    > 	at org.apache.lucene.analysis.minhash.MinHashFilter$FixedSizeTreeSet.add(MinHashFilter.java:325)
         [junit4]    > 	at org.apache.lucene.analysis.minhash.MinHashFilter.incrementToken(MinHashFilter.java:159)
         [junit4]    > 	at org.apache.lucene.analysis.ValidatingTokenFilter.incrementToken(ValidatingTokenFilter.java:67)
         [junit4]    > 	at org.apache.lucene.analysis.bg.BulgarianStemFilter.incrementToken(BulgarianStemFilter.java:48)
         [junit4]    > 	at org.apache.lucene.analysis.ValidatingTokenFilter.incrementToken(ValidatingTokenFilter.java:67)
         [junit4]    > 	at org.apache.lucene.analysis.BaseTokenStreamTestCase.checkResetException(BaseTokenStreamTestCase.java:405)
         [junit4]    > 	at org.apache.lucene.analysis.BaseTokenStreamTestCase.checkRandomData(BaseTokenStreamTestCase.java:510)
         [junit4]    > 	at org.apache.lucene.analysis.core.TestRandomChains.testRandomChainsWithLargeStrings(TestRandomChains.java:959)
         [junit4]    > 	at java.lang.Thread.run(Thread.java:745)
         [junit4]   2> NOTE: test params are: codec=Asserting(Lucene62): {dummy=Lucene50(blocksize=128)}, docValues:{}, maxPointsInLeafNode=252, maxMBSortInHeap=5.297834377897023, sim=ClassicSimilarity, locale=ar-OM, timezone=Atlantic/South_Georgia
         [junit4]   2> NOTE: Linux 4.1.0-custom2-amd64 amd64/Oracle Corporation 1.8.0_77 (64-bit)/cpus=16,threads=1,free=395080152,total=465567744
         [junit4]   2> NOTE: All tests run in this JVM: [TestDecimalDigitFilterFactory, TestMultiWordSynonyms, TestReversePathHierarchyTokenizer, TestDoubleEscape, TestHunspellStemFilterFactory, TestArabicNormalizationFilter, TestUAX29URLEmailAnalyzer, TestSwedishLightStemFilterFactory, TestBulgarianStemmer, TestASCIIFoldingFilter, TestDelimitedPayloadTokenFilterFactory, TestIndonesianStemmer, TestCircumfix, EdgeNGramTokenFilterTest, TestPatternTokenizer, TestScandinavianFoldingFilter, TestIgnore, TestRandomChains]
         [junit4] Completed [130/272 (1!)] on J4 in 9.85s, 2 tests, 1 error <<< FAILURES!
      

        Attachments

        1. LUCENE-7442.patch
          3 kB
          Cao Manh Dat
        2. LUCENE-7442.patch
          1 kB
          Cao Manh Dat

          Issue Links

            Activity

              People

              • Assignee:
                steve_rowe Steve Rowe
                Reporter:
                steve_rowe Steve Rowe
              • Votes:
                0 Vote for this issue
                Watchers:
                3 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: