Uploaded image for project: 'Solr'
  1. Solr
  2. SOLR-13352

possible deadlock/threadleak from OverseerTriggerThread/AutoScalingWatcher during close()

    XMLWordPrintableJSON

    Details

    • Type: Bug
    • Status: Closed
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 7.7.2, 8.1, master (9.0)
    • Component/s: None
    • Labels:
      None

      Description

      A recent jenkins failure in TestSimTriggerIntegration lead me to what appears to be a "lock leak" situation in OverseerTriggerThread in how the "updateLock" object is dealt with in the event that the OverseerTriggerThread is closed.

      It's possible that this only affects tests using the SimCloudManager when calling "simRestartOverseer" – but
      I believe this can lead also lead to an actual deadlock / threadleak situation in a thread running AutoScalingWatcher (that hold a refrefrences to OverseerTriggerThread and every object reachable from it) when the OverseerTriggerThread is closed as part of a real Solr shutdown ... which i think would cause the JVM to stall untill externally killed.


      If my analysis of the test failure (to follow in comment) is correct, then even even if this bug isn't likely to affect real world solr instances (and only surfaces because of how OverseerTriggerThread is used in SimCloudManager) the fix to OverseerTriggerThread is a trivial change to follow locking best practices (patch to follow)

        Attachments

        1. sarowe_Lucene-Solr-tests-master_20462.log.txt
          2.78 MB
          Chris M. Hostetter
        2. SOLR-13352.patch
          3 kB
          Chris M. Hostetter

          Activity

            People

            • Assignee:
              hossman Chris M. Hostetter
              Reporter:
              hossman Chris M. Hostetter
            • Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: