Uploaded image for project: 'Solr'
  1. Solr
  2. SOLR-12990

High test failure rate on Java11/12 when (randomized) ssl=true clientAuth=false

Attach filesAttach ScreenshotVotersWatch issueWatchersCreate sub-taskLinkCloneUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Duplicate
    • None
    • None
    • None

    Description

      Ever since the policeman's Jenkins instance started running tests on Java11, we've seen an abnormally high number of test failures that seem to be related to randomzed ssl.

      I've been investigating these logs, and trying to reproduce and have found the following observations:

      • In all the policeman jenkins logs i looked at, these SSL related failures only occur when the RandomizeSSL annotation picks ssl=true clientAuth=false
        • NOTE: this doesn't mean that every test using ssl=true clientAuth=false failed – since our build system only prints test output when tests fail, it's possible/probably (based on how often the value should be picked) that many tests randomly use ssl=true clientAuth=false and pass
      • the failures usually showed an exception that was Caused by: javax.net.ssl.SSLException: Received fatal alert: internal_error in the logs.
      • when i attempted to re-produce some of these failing seeds on my own machine using Java11, i could not reliably reproduce these failures w/the same seeds
        • beasting could occasionally reproduce the failures, at roughly 1/10 runs
        • suggesting that system load/timing contributed to these SSL related failures
      • picking one particularly trivial test (DistributedDebugComponentTest)
        • with javax.net.debug=all enabled, i was able to see more details...
          • notably: Fatal (INTERNAL_ERROR): Session has no PSK
        • when I patched the test to force ssl=true clientAuth=true I was unable to trigger any failures with the same seed.
      • on the jira/http2 branch I was unable to reproduce these failures at all, w/o any patching
        • similar to SOLR-12988, this may be because of bug fixes in the upgraded jetty.

      Filing this issue largely for tracking purpose, although we may also want to use it for discussions/considerations of other backports/fixes to 7x

      Attachments

        Issue Links

        Activity

          This comment will be Viewable by All Users Viewable by All Users
          Cancel

          People

            Unassigned Unassigned
            hossman Chris M. Hostetter
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Slack

                Issue deployment