Uploaded image for project: 'Solr'
  1. Solr
  2. SOLR-14431

Using "Segments Info" UI screen can cause future stalls in replication/recovery/core-reload (/admin/segments)

    XMLWordPrintableJSON

    Details

      Description

      A bug in the SegmentsInfoRequestHandler (aka: /admin/segments - which is used under the covers when viewing the "Segments Info" panel of a core in the Admin UI) causes it to increment the internal "ref-count" of the IndexWriter by default, with out ever decrementing that ref-count.

      This can cause delayed problems in any situation where the IndexWriter needs updated/replaced/locked:

      • Core RELOAD operations
      • Master/Slave replication (via IndexFetcher)
      • PULL Replica updates (via IndexFetcher)
      • TLOG Replica updates (via IndexFetcher)
      • NRT Recovery from Leader (via IndexFetcher)

      ...these manifest as operations that "stall" due to the threads attempting to execute them blocking forever waiting for a ReentrantReadWriteLock in DefaultSolrCoreState that will never be released.

      A config only workaround exists for this problem, by explicitly declaring the /admin/segments handler in solrconfig.xml with an invariants param that requests additional info, forcing it down a code path where it uses the IndexWriter, and decrements the ref-count, releasing the lock.

      solrconfig.xml workaround
        <requestHandler name="/admin/segments" class="solr.SegmentsInfoRequestHandler">
          <!-- work around for https://issues.apache.org/jira/browse/SOLR-14431 -->
          <lst name="invariants">
            <bool name="coreInfo">true</bool>
          </lst>
        </requestHandler>
      

      Example stack traces of what this can look like

      IndexFetcher example stalled thread"
            "thread",{
              "id":65,
              "name":"indexFetcher-19-thread-1",
              "state":"TIMED_WAITING",
              "lock":"java.util.concurrent.locks.ReentrantReadWriteLock$NonfairSync@22a18ed",
              "cpuTime":"1454860.0285ms",
              "userTime":"622230.0000ms",
              "stackTrace":["java.base@11.0.7/jdk.internal.misc.Unsafe.park(Native Method)",
                "java.base@11.0.7/java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:234)",
                "java.base@11.0.7/java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireNanos(AbstractQueuedSynchronizer.java:980)",
                "java.base@11.0.7/java.util.concurrent.locks.AbstractQueuedSynchronizer.tryAcquireNanos(AbstractQueuedSynchronizer.java:1288)",
                "java.base@11.0.7/java.util.concurrent.locks.ReentrantReadWriteLock$WriteLock.tryLock(ReentrantReadWriteLock.java:1131)",
                "org.apache.solr.update.DefaultSolrCoreState.lock(DefaultSolrCoreState.java:179)",
                "org.apache.solr.update.DefaultSolrCoreState.closeIndexWriter(DefaultSolrCoreState.java:240)",
                "org.apache.solr.handler.IndexFetcher.fetchLatestIndex(IndexFetcher.java:569)",
                "org.apache.solr.handler.IndexFetcher.fetchLatestIndex(IndexFetcher.java:351)",
                "org.apache.solr.handler.ReplicationHandler.doFetch(ReplicationHandler.java:424)",
                "org.apache.solr.handler.ReplicationHandler.lambda$setupPolling$13(ReplicationHandler.java:1210)",
                "org.apache.solr.handler.ReplicationHandler$$Lambda$513/0x00000008006bf440.run(Unknown Source)",
                "java.base@11.0.7/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515)",
                "java.base@11.0.7/java.util.concurrent.FutureTask.runAndReset(FutureTask.java:305)",
                "java.base@11.0.7/java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:305)",
                "java.base@11.0.7/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)",
                "java.base@11.0.7/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)",
                "java.base@11.0.7/java.lang.Thread.run(Thread.java:834)"]},
      
      Core RELOAD example stalled thread
            "thread",{
              "id":16,
              "name":"qtp1558079303-16",
              "state":"WAITING",
              "lock":"java.lang.Object@70c81fe1",
              "cpuTime":"73.4453ms",
              "userTime":"60.0000ms",
              "stackTrace":["java.base@11.0.4/java.lang.Object.wait(Native Method)",
                "java.base@11.0.4/java.lang.Object.wait(Object.java:328)",
                "org.apache.solr.core.SolrCores.waitAddPendingCoreOps(SolrCores.java:394)",
                "org.apache.solr.core.CoreContainer.reload(CoreContainer.java:1545)",
                "org.apache.solr.handler.admin.CoreAdminOperation.lambda$static$2(CoreAdminOperation.java:132)",
                "org.apache.solr.handler.admin.CoreAdminOperation$$Lambda$266/0x0000000100431040.execute(Unknown Source)",
                "org.apache.solr.handler.admin.CoreAdminOperation.execute(CoreAdminOperation.java:362)",
                "org.apache.solr.handler.admin.CoreAdminHandler$CallInfo.call(CoreAdminHandler.java:397)",
                "org.apache.solr.handler.admin.CoreAdminHandler.handleRequestBody(CoreAdminHandler.java:181)",
                "org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:211)",
                "org.apache.solr.servlet.HttpSolrCall.handleAdmin(HttpSolrCall.java:839)",
                "org.apache.solr.servlet.HttpSolrCall.handleAdminRequest(HttpSolrCall.java:805)",
                "org.apache.solr.servlet.HttpSolrCall.call(HttpSolrCall.java:558)",
                "org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:419)",
                "org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:351)",
                "org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1602)",
      ...
      
      "Original Jira Description"

      If withCoreInfo is false iwRef.decref() will not
      be called to release the reader lock, preventing any further writer locks.
      https://github.com/apache/lucene-solr/blob/3a743ea953f0ecfc35fc7b198f68d142ce99d789/solr/core/src/java/org/apache/solr/handler/admin/SegmentsInfoRequestHandler.java#L144

      Line 130 should be moved inside the if statement L144.

      Andrzej Bialecki FYI

        Attachments

          Issue Links

            Activity

              People

              • Assignee:
                ab Andrzej Bialecki
                Reporter:
                Tiziano.DE Tiziano Degaetano
              • Votes:
                0 Vote for this issue
                Watchers:
                5 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: