Uploaded image for project: 'Lucene.Net'
  1. Lucene.Net
  2. LUCENENET-607

InvalidCastException PendingTerm cannot be cast to PendingBlock

    Details

    • Type: Bug
    • Status: Open
    • Priority: Major
    • Resolution: Unresolved
    • Affects Version/s: Lucene.Net 4.8.0
    • Fix Version/s: None
    • Component/s: Lucene.Net Core
    • Labels:
      None

      Description

      Here is exception call stack:

      at Lucene.Net.Codecs.BlockTreeTermsWriter.TermsWriter.Finish(Int64 sumTotalTermFreq, Int64 sumDocFreq, Int32 docCount, TermsHashPerField termsHashPerField)
      at Lucene.Net.Index.FreqProxTermsWriterPerField.Flush(String fieldName, FieldsConsumer consumer, SegmentWriteState state)
      at Lucene.Net.Index.FreqProxTermsWriter.Flush(IDictionary`2 fieldsToFlush, SegmentWriteState state)
      at Lucene.Net.Index.TermsHash.Flush(IDictionary`2 fieldsToFlush, SegmentWriteState state)
      at Lucene.Net.Index.DocInverter.Flush(IDictionary`2 fieldsToFlush, SegmentWriteState state)
      at Lucene.Net.Index.DocFieldProcessor.Flush(SegmentWriteState state)
      at Lucene.Net.Index.DocumentsWriterPerThread.Flush()
      at Lucene.Net.Index.DocumentsWriter.DoFlush(DocumentsWriterPerThread flushingDWPT)
      at Lucene.Net.Index.DocumentsWriter.FlushAllThreads(IndexWriter indexWriter)
      at Lucene.Net.Index.IndexWriter.GetReader(Boolean applyAllDeletes)
      at Lucene.Net.Index.StandardDirectoryReader.DoOpenFromWriter(IndexCommit commit)
      at Lucene.Net.Search.SearcherManager.RefreshIfNeeded(IndexSearcher referenceToRefresh)
      at Lucene.Net.Search.ReferenceManager`1.DoMaybeRefresh()
      at Lucene.Net.Search.ReferenceManager`1.MaybeRefreshBlocking()
      at Lucene.Net.Search.ControlledRealTimeReopenThread`1.Run()
      

      Issue is quite "hard-to-reproduce" and appears only when adding documents with the same terms concurrently. I have not managed to make a clear test that reproduces the issue.

      I've made some research and found out that the cause of the issue are duplicate terms in BytesRefHash structure. BytesRefHash using the Murmurhash3_x86_32 hashing algorithm with the random seed (see StringHelper.GOOD_FAST_HASH_SEED property). StringHelper.GOOD_FAST_HASH_SEED property is not thread-safe and could return different values if called in severeal threads in one moment, so it could result in duplicate values in BytesRefHash (same values return different hashes because hashes were calcucated with different seeds).

      There is another issue with GOOD_FAST_HASH_SEED. DateTime.Now.Millisecond is used to randomize the seed, but DateTime.Now.Millisecond could return 0 and this value is treated an "uninitialized" and the second GOOD_FAST_HASH_SEED call will return another value.

      The issue could be easely fixed by moving the GOOD_FAST_HASH_SEED initialization to the static ctor of StringHelper. It will make it thread-safe and will fix 0-value issue.

        Attachments

          Issue Links

            Activity

              People

              • Assignee:
                Unassigned
                Reporter:
                hindikaynen Khindikaynen Aleksey
              • Votes:
                3 Vote for this issue
                Watchers:
                4 Start watching this issue

                Dates

                • Created:
                  Updated:

                  Time Tracking

                  Estimated:
                  Original Estimate - Not Specified
                  Not Specified
                  Remaining:
                  Remaining Estimate - 0h
                  0h
                  Logged:
                  Time Spent - 50m
                  50m