Uploaded image for project: 'Lucene - Core'
  1. Lucene - Core
  2. LUCENE-6788

Mishandling of Integer.MIN_VALUE in FuzzySet leads to AssertionError

Details

    • Bug
    • Status: Open
    • Major
    • Resolution: Unresolved
    • 4.10.4, 6.0
    • None
    • core/index
    • None
    • New

    Description

      Reindexing some data in the DataStax Enterprise Search product (which uses Solr) led to these stack traces:

      ERROR Lucene Merge Thread #13430 2015-09-08 11:14:36,582 CassandraDaemon.java (line 258) Exception in thread ThreadLucene Merge Thread #13430,6,main
      org.apache.lucene.index.MergePolicy$MergeException: java.lang.AssertionError
      at org.apache.lucene.index.ConcurrentMergeScheduler.handleMergeException(ConcurrentMergeScheduler.java:545)
      at org.apache.lucene.index.ConcurrentMergeScheduler$MergeThread.run(ConcurrentMergeScheduler.java:518)
      Caused by: java.lang.AssertionError
      at org.apache.lucene.codecs.bloom.FuzzySet.mayContainValue(FuzzySet.java:216)
      at org.apache.lucene.codecs.bloom.FuzzySet.contains(FuzzySet.java:165)
      at org.apache.lucene.codecs.bloom.BloomFilteringPostingsFormat$BloomFilteredFieldsProducer$BloomFilteredTermsEnum.seekExact(BloomFilteringPostingsFormat.java:351)
      at org.apache.lucene.index.BufferedUpdatesStream.applyTermDeletes(BufferedUpdatesStream.java:414)
      at org.apache.lucene.index.BufferedUpdatesStream.applyDeletesAndUpdates(BufferedUpdatesStream.java:283)
      at org.apache.lucene.index.IndexWriter._mergeInit(IndexWriter.java:3838)
      at org.apache.lucene.index.IndexWriter.mergeInit(IndexWriter.java:3799)
      at org.apache.lucene.index.IndexWriter.merge(IndexWriter.java:3651)
      at org.apache.lucene.index.ConcurrentMergeScheduler.doMerge(ConcurrentMergeScheduler.java:405)
      at org.apache.lucene.index.ConcurrentMergeScheduler$MergeThread.run(ConcurrentMergeScheduler.java:482)

      In tracking down the cause of the stack trace, I noticed this:
      https://github.com/apache/lucene-solr/blob/trunk/lucene/codecs/src/java/org/apache/lucene/codecs/bloom/FuzzySet.java#L164

      It is possible for the Murmur2 hash to return Integer.MIN_VALUE (e.g. when hashing "WeH44wlbCK"). Multiplying Integer.MIN_VALUE by -1 returns Integer.MIN_VALUE again, so the "positiveHash >= 0" assertion at line 217 fails.

      We could special-case Integer.MIN_VALUE, map it to 42 or some other magic number... since the same "* -1" logic appears on line 236 perhaps it should be part of the hash function?

      Attachments

        Activity

          People

            Unassigned Unassigned
            tarrall Robert Tarrall
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

              Created:
              Updated: