Uploaded image for project: 'Solr'
  1. Solr
  2. SOLR-13393

ZkClientClusterStateProvider can leak ZkStateReader (and associated watcher threads) if background threads attempt to use it after close() .

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Closed
    • Major
    • Resolution: Fixed
    • None
    • 8.1, 9.0
    • None
    • None

    Description

      while digging into some test failures related to leaked ZkStateReader objects, i noticed a pattern which i beleive can be explained by the fact that ZkClientClusterStateProvider does not complain/fail if some caller tries to connect()/use it after it's already been closed – in this situation it will just re-create a new ZkStateReader (which is later leaked)

      So in in situations where background/timer threads use a SolrClientCloudManager/ZkClientClusterStateProvider, we might see...

      T1 : start shutdown...
      T1 :  ...SolrClientCloudManager.close()...
      T1 :   ...ZkClientClusterStateProvider.close()...
      T1 :    ...ZkStateReader.close()
      T1 :    ...zkStateReader = null;
      T 2: run background thread/task/trigger...
      T 2:  ...get ZkClientClusterStateProvider
      T 2:  ...call ZkClientClusterStateProvider.connect()
      T 2:   ...zkStateReader = new ZkStateReader()                 /* LEAKED */
      T 2:  ... do something with ZkClientClusterStateProvider
      T 2:  ...finish background thread/task/trigger
      T1 :  ...finish shutdown of ZkClientClusterStateProvider / SolrClientCloudManager
      

      Attachments

        1. SOLR-13393.patch
          11 kB
          Chris M. Hostetter
        2. SOLR-13393.patch
          10 kB
          Chris M. Hostetter
        3. SOLR-13393.patch
          1 kB
          Chris M. Hostetter

        Activity

          People

            hossman Chris M. Hostetter
            hossman Chris M. Hostetter
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: