Uploaded image for project: 'Solr'
  1. Solr
  2. SOLR-16992

Non-reproducible StreamingTest failures -- suggests CloudSolrStream concurency race condition

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Closed
    • Major
    • Resolution: Fixed
    • None
    • 9.4
    • None
    • None

    Description

      Roughly 3% of all jenkins jobs that run StreamingTest wind up having suite level failures.

      These failures have historically taken the form of com.carrotsearch.randomizedtesting.ThreadLeakError and the leaked threads all have names like
      "h2sc-718-thread-2" indicating that they come from the internal ExecutorService of an Http2SolrClient.

      In my experience, the seeds from these failures have never reproduced - suggesting that the problem is related to concurrency.

      SOLR-16983 restored the (correct) use of ObjectReleaseTracker which in theory should help pinpoint where Http2SolrClient instances might not be getting closed (by causing ObjectReleaseTracker to fail with stacktraces of when/where any unclosed instances were created - ie: which test method)

      In practice, I have managed to force one failure from StreamingTest since the SOLR-16983 changes (logs to be attached soon) - but it still didn't indicate any leaked/unclosed Http2SolrClient instances. What it instead indicated was a single unclosed InputStream instance related to Http2SolrClient connections (SOLR-16983 also added better tracking of this) coming from StreamingTest.testExceptionStream - a test method that opens five very similar ExceptionStream instances, wrapping CloudSolrStream instance, which expect to trigger server side errors.

      By it's very design, ExceptionStream catches & records any exceptions from the stream it wraps, so even in the event of these "expected" server side errors, ExceptionStream.close() should still be correctly getting called (and propagating down to the CloudStream it wraps).

      I believe the underlying problem has to do with a concurrency race condition between the call to CloudStream.close() and the ExecutorService used internally by CloudSolrStream.openStreams() (details to follow)

      Attachments

        Issue Links

          Activity

            People

              stillalex Alex Deparvu
              hossman Chris M. Hostetter
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Time Tracking

                  Estimated:
                  Original Estimate - Not Specified
                  Not Specified
                  Remaining:
                  Remaining Estimate - 0h
                  0h
                  Logged:
                  Time Spent - 20m
                  20m