Details
-
Bug
-
Status: Closed
-
Critical
-
Resolution: Fixed
-
8.1.1, 8.5.1
-
None
Description
A bug in the SegmentsInfoRequestHandler (aka: /admin/segments - which is used under the covers when viewing the "Segments Info" panel of a core in the Admin UI) causes it to increment the internal "ref-count" of the IndexWriter by default, with out ever decrementing that ref-count.
This can cause delayed problems in any situation where the IndexWriter needs updated/replaced/locked:
- Core RELOAD operations
- Master/Slave replication (via IndexFetcher)
- PULL Replica updates (via IndexFetcher)
- TLOG Replica updates (via IndexFetcher)
- NRT Recovery from Leader (via IndexFetcher)
...these manifest as operations that "stall" due to the threads attempting to execute them blocking forever waiting for a ReentrantReadWriteLock in DefaultSolrCoreState that will never be released.
A config only workaround exists for this problem, by explicitly declaring the /admin/segments handler in solrconfig.xml with an invariants param that requests additional info, forcing it down a code path where it uses the IndexWriter, and decrements the ref-count, releasing the lock.
<requestHandler name="/admin/segments" class="solr.SegmentsInfoRequestHandler"> <!-- work around for https://issues.apache.org/jira/browse/SOLR-14431 --> <lst name="invariants"> <bool name="coreInfo">true</bool> </lst> </requestHandler>
Example stack traces of what this can look like
"thread",{ "id":65, "name":"indexFetcher-19-thread-1", "state":"TIMED_WAITING", "lock":"java.util.concurrent.locks.ReentrantReadWriteLock$NonfairSync@22a18ed", "cpuTime":"1454860.0285ms", "userTime":"622230.0000ms", "stackTrace":["java.base@11.0.7/jdk.internal.misc.Unsafe.park(Native Method)", "java.base@11.0.7/java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:234)", "java.base@11.0.7/java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireNanos(AbstractQueuedSynchronizer.java:980)", "java.base@11.0.7/java.util.concurrent.locks.AbstractQueuedSynchronizer.tryAcquireNanos(AbstractQueuedSynchronizer.java:1288)", "java.base@11.0.7/java.util.concurrent.locks.ReentrantReadWriteLock$WriteLock.tryLock(ReentrantReadWriteLock.java:1131)", "org.apache.solr.update.DefaultSolrCoreState.lock(DefaultSolrCoreState.java:179)", "org.apache.solr.update.DefaultSolrCoreState.closeIndexWriter(DefaultSolrCoreState.java:240)", "org.apache.solr.handler.IndexFetcher.fetchLatestIndex(IndexFetcher.java:569)", "org.apache.solr.handler.IndexFetcher.fetchLatestIndex(IndexFetcher.java:351)", "org.apache.solr.handler.ReplicationHandler.doFetch(ReplicationHandler.java:424)", "org.apache.solr.handler.ReplicationHandler.lambda$setupPolling$13(ReplicationHandler.java:1210)", "org.apache.solr.handler.ReplicationHandler$$Lambda$513/0x00000008006bf440.run(Unknown Source)", "java.base@11.0.7/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515)", "java.base@11.0.7/java.util.concurrent.FutureTask.runAndReset(FutureTask.java:305)", "java.base@11.0.7/java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:305)", "java.base@11.0.7/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)", "java.base@11.0.7/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)", "java.base@11.0.7/java.lang.Thread.run(Thread.java:834)"]},
"thread",{ "id":16, "name":"qtp1558079303-16", "state":"WAITING", "lock":"java.lang.Object@70c81fe1", "cpuTime":"73.4453ms", "userTime":"60.0000ms", "stackTrace":["java.base@11.0.4/java.lang.Object.wait(Native Method)", "java.base@11.0.4/java.lang.Object.wait(Object.java:328)", "org.apache.solr.core.SolrCores.waitAddPendingCoreOps(SolrCores.java:394)", "org.apache.solr.core.CoreContainer.reload(CoreContainer.java:1545)", "org.apache.solr.handler.admin.CoreAdminOperation.lambda$static$2(CoreAdminOperation.java:132)", "org.apache.solr.handler.admin.CoreAdminOperation$$Lambda$266/0x0000000100431040.execute(Unknown Source)", "org.apache.solr.handler.admin.CoreAdminOperation.execute(CoreAdminOperation.java:362)", "org.apache.solr.handler.admin.CoreAdminHandler$CallInfo.call(CoreAdminHandler.java:397)", "org.apache.solr.handler.admin.CoreAdminHandler.handleRequestBody(CoreAdminHandler.java:181)", "org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:211)", "org.apache.solr.servlet.HttpSolrCall.handleAdmin(HttpSolrCall.java:839)", "org.apache.solr.servlet.HttpSolrCall.handleAdminRequest(HttpSolrCall.java:805)", "org.apache.solr.servlet.HttpSolrCall.call(HttpSolrCall.java:558)", "org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:419)", "org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:351)", "org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1602)", ...
If withCoreInfo is false iwRef.decref() will not
be called to release the reader lock, preventing any further writer locks.
https://github.com/apache/lucene-solr/blob/3a743ea953f0ecfc35fc7b198f68d142ce99d789/solr/core/src/java/org/apache/solr/handler/admin/SegmentsInfoRequestHandler.java#L144
Line 130 should be moved inside the if statement L144.
ab FYI
Attachments
Issue Links
- blocks
-
SOLR-14450 SegmentsInfoRequestHandler doesn't properly close ref-counted IW
- Resolved
- is duplicated by
-
SOLR-14458 Solr Replica locked in recovering state after a Zookeeper disconnection
- Resolved