Solr
  1. Solr
  2. SOLR-4960

race condition in CoreContainer.shutdown leads to double closes on cores

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: 4.3
    • Fix Version/s: 4.4, Trunk
    • Component/s: None
    • Labels:
      None

      Description

      CoreContainer.shutdown has a race condition that can lead to a closed (or closing) core being handed out to an incoming request. This can further lead to SolrCore.close() logic being executed again when the request is finished.

      This bug was introduced in SOLR-4196 r1451797

      1. SOLR-4960.patch
        2 kB
        Yonik Seeley
      2. SOLR-4960_getCore.patch
        3 kB
        Yonik Seeley

        Issue Links

          Activity

          Hide
          Veera added a comment -

          Hi there

          I am running 4.3.1 solrcloud and seeing a few of my cores dying with this exception (during reads, writes, recovery etc)

          org.apache.solr.common.SolrException; org.apache.solr.common.SolrException: SolrCoreState already closed
          solr.log.117-3616225- at org.apache.solr.update.DefaultSolrCoreState.getIndexWriter(DefaultSolrCoreState.java:84)
          solr.log.117-3616318- at org.apache.solr.update.DirectUpdateHandler2.commit(DirectUpdateHandler2.java:524)

          Could this bug be the root cause of this issue?

          Show
          Veera added a comment - Hi there I am running 4.3.1 solrcloud and seeing a few of my cores dying with this exception (during reads, writes, recovery etc) org.apache.solr.common.SolrException; org.apache.solr.common.SolrException: SolrCoreState already closed solr.log.117-3616225- at org.apache.solr.update.DefaultSolrCoreState.getIndexWriter(DefaultSolrCoreState.java:84) solr.log.117-3616318- at org.apache.solr.update.DirectUpdateHandler2.commit(DirectUpdateHandler2.java:524) Could this bug be the root cause of this issue?
          Hide
          Steve Rowe added a comment -

          Bulk close resolved 4.4 issues

          Show
          Steve Rowe added a comment - Bulk close resolved 4.4 issues
          Hide
          Erick Erickson added a comment -

          Several revisions all told

          trunk: 1496546, 1496620 and 1497999
          4x: 1498010

          Show
          Erick Erickson added a comment - Several revisions all told trunk: 1496546, 1496620 and 1497999 4x: 1498010
          Hide
          Erick Erickson added a comment -

          OK, I'll run a few more tests, then merge these two patches back into the 4x code line and then apply SOLR-4794 to the whole lot.

          Show
          Erick Erickson added a comment - OK, I'll run a few more tests, then merge these two patches back into the 4x code line and then apply SOLR-4794 to the whole lot.
          Hide
          Erick Erickson added a comment -

          Yonik Seeley What's the state of all this? These patches look like they're on trunk but not 4x. When I looked at them I realized that the transient and pending lists could be handled the same way (actually in a single list) which simplifies things.

          I'll open up a new JIRA for my additions, but we need to merge these changes into 4x before I deal with the next patch....

          Show
          Erick Erickson added a comment - Yonik Seeley What's the state of all this? These patches look like they're on trunk but not 4x. When I looked at them I realized that the transient and pending lists could be handled the same way (actually in a single list) which simplifies things. I'll open up a new JIRA for my additions, but we need to merge these changes into 4x before I deal with the next patch....
          Hide
          Yonik Seeley added a comment -

          Here's a patch to fix the race in CoreContainer.getCore()

          I changed the signature of SolrCores.getCoreFromAnyList to accept a boolean to increment the reference count and only fixed getCore to use it. There may still be other races due to the transient core feature.

          Show
          Yonik Seeley added a comment - Here's a patch to fix the race in CoreContainer.getCore() I changed the signature of SolrCores.getCoreFromAnyList to accept a boolean to increment the reference count and only fixed getCore to use it. There may still be other races due to the transient core feature.
          Hide
          Yonik Seeley added a comment -

          There's also a race condition in getCore itself (there's no synchronization between the core lookup and incrementing the ref count, so we could hand out a closing/closed core. Looks like it was introduced by SOLR-4196 as well.
          I'll just use this issue to fix this bug also.

          Show
          Yonik Seeley added a comment - There's also a race condition in getCore itself (there's no synchronization between the core lookup and incrementing the ref count, so we could hand out a closing/closed core. Looks like it was introduced by SOLR-4196 as well. I'll just use this issue to fix this bug also.
          Hide
          Yonik Seeley added a comment -

          Here's a patch that fixes things for normal cores - I didn't touch the transient handling.

          Show
          Yonik Seeley added a comment - Here's a patch that fixes things for normal cores - I didn't touch the transient handling.

            People

            • Assignee:
              Yonik Seeley
              Reporter:
              Yonik Seeley
            • Votes:
              0 Vote for this issue
              Watchers:
              7 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development