Uploaded image for project: 'Solr'
  1. Solr
  2. SOLR-14969

Prevent creating multiple cores with the same name which leads to instabilities (race condition)

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • 8.6, 8.6.3
    • 8.8
    • multicore
    • None

    Description

      CoreContainer#create does not correctly handle concurrent requests to create the same core. There's a race condition (see also existing TODO comment in the code), and CoreContainer#createFromDescriptor may be called subsequently for the same core name.

      The second call then fails to create an IndexWriter, and exception handling causes an inconsistent CoreContainer state.

      2020-10-27 00:29:25.350 ERROR (qtp2029754983-24) [   ] o.a.s.h.RequestHandlerBase org.apache.solr.common.SolrException: Error CREATEing SolrCore 'blueprint_acgqqafsogyc_comments': Unable to create core [blueprint_acgqqafsogyc_comments] Caused by: Lock held by this virtual machine: /var/solr/data/blueprint_acgqqafsogyc_comments/data/index/write.lock
      
               at org.apache.solr.core.CoreContainer.create(CoreContainer.java:1312)
               at org.apache.solr.handler.admin.CoreAdminOperation.lambda$static$0(CoreAdminOperation.java:95)
               at org.apache.solr.handler.admin.CoreAdminOperation.execute(CoreAdminOperation.java:367)
      ...
      Caused by: org.apache.solr.common.SolrException: Unable to create core [blueprint_acgqqafsogyc_comments]
               at org.apache.solr.core.CoreContainer.createFromDescriptor(CoreContainer.java:1408)
               at org.apache.solr.core.CoreContainer.create(CoreContainer.java:1273)
               ... 47 more
      Caused by: org.apache.solr.common.SolrException: Error opening new searcher
               at org.apache.solr.core.SolrCore.<init>(SolrCore.java:1071)
               at org.apache.solr.core.SolrCore.<init>(SolrCore.java:906)
               at org.apache.solr.core.CoreContainer.createFromDescriptor(CoreContainer.java:1387)
               ... 48 more
      Caused by: org.apache.solr.common.SolrException: Error opening new searcher
               at org.apache.solr.core.SolrCore.openNewSearcher(SolrCore.java:2184)
               at org.apache.solr.core.SolrCore.getSearcher(SolrCore.java:2308)
               at org.apache.solr.core.SolrCore.initSearcher(SolrCore.java:1130)
               at org.apache.solr.core.SolrCore.<init>(SolrCore.java:1012)
               ... 50 more
      Caused by: org.apache.lucene.store.LockObtainFailedException: Lock held by this virtual machine: /var/solr/data/blueprint_acgqqafsogyc_comments/data/index/write.lock
               at org.apache.lucene.store.NativeFSLockFactory.obtainFSLock(NativeFSLockFactory.java:139)
               at org.apache.lucene.store.FSLockFactory.obtainLock(FSLockFactory.java:41)
               at org.apache.lucene.store.BaseDirectory.obtainLock(BaseDirectory.java:45)
               at org.apache.lucene.store.FilterDirectory.obtainLock(FilterDirectory.java:105)
               at org.apache.lucene.index.IndexWriter.<init>(IndexWriter.java:785)
               at org.apache.solr.update.SolrIndexWriter.<init>(SolrIndexWriter.java:126)
               at org.apache.solr.update.SolrIndexWriter.create(SolrIndexWriter.java:100)
               at org.apache.solr.update.DefaultSolrCoreState.createMainIndexWriter(DefaultSolrCoreState.java:261)
               at org.apache.solr.update.DefaultSolrCoreState.getIndexWriter(DefaultSolrCoreState.java:135)
               at org.apache.solr.core.SolrCore.openNewSearcher(SolrCore.java:2145) 
      

      CoreContainer#createFromDescriptor removes the CoreDescriptor when handling this exception. The SolrCore created for the first successful call is still registered in SolrCores.cores, but now there's no corresponding CoreDescriptor for that name anymore.

      This inconsistency leads to subsequent NullPointerExceptions, for example when using CoreAdmin STATUS with the core name: CoreAdminOperation#getCoreStatus first gets the non-null SolrCore (cores.getCore(cname)) but core.getInstancePath() throws an NPE, because the CoreDescriptor is not registered anymore:

      2020-10-27 00:29:25.353 INFO  (qtp2029754983-19) [   ] o.a.s.s.HttpSolrCall [admin] webapp=null path=/admin/cores params={core=blueprint_acgqqafsogyc_comments&action=STATUS&indexInfo=false&wt=javabin&version=2} status=500 QTime=0
      2020-10-27 00:29:25.353 ERROR (qtp2029754983-19) [   ] o.a.s.s.HttpSolrCall null:org.apache.solr.common.SolrException: Error handling 'STATUS' action
               at org.apache.solr.handler.admin.CoreAdminOperation.execute(CoreAdminOperation.java:372)
               at org.apache.solr.handler.admin.CoreAdminHandler$CallInfo.call(CoreAdminHandler.java:397)
               at org.apache.solr.handler.admin.CoreAdminHandler.handleRequestBody(CoreAdminHandler.java:181)
      ...
      Caused by: java.lang.NullPointerException
               at org.apache.solr.core.SolrCore.getInstancePath(SolrCore.java:333)
               at org.apache.solr.handler.admin.CoreAdminOperation.getCoreStatus(CoreAdminOperation.java:329)
               at org.apache.solr.handler.admin.StatusOp.execute(StatusOp.java:54)
               at org.apache.solr.handler.admin.CoreAdminOperation.execute(CoreAdminOperation.java:367)
      

      STATUS keeps failing until Solr is restarted.

      The NPE for CoreAdmin STATUS is a regression in 8.6. It seems to be caused by https://github.com/apache/lucene-solr/commit/17ae79b0905b2bf8635c1b260b30807cae2f5463#diff-9652fe8353b7eff59cd6f128bb2699d88361e670b840ee5ca1018b1bc45584d1R324

      Attachments

        1. CmCoreAdminHandler.java
          2 kB
          Andreas Hubold

        Issue Links

          Activity

            People

              erickerickson Erick Erickson
              ahubold Andreas Hubold
              Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Time Tracking

                  Estimated:
                  Original Estimate - Not Specified
                  Not Specified
                  Remaining:
                  Remaining Estimate - 0h
                  0h
                  Logged:
                  Time Spent - 1h 20m
                  1h 20m