Details
Description
We found one case of a very slow, possibly endless loop in this method.
java.lang.Thread.State: RUNNABLE at org.apache.lucene.search.TopScoreDocCollector$InOrderPagingScoreDocCollector.collect(TopScoreDocCollector.java:85) at org.apache.lucene.search.Scorer.score(Scorer.java:65) at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:621) at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:491) at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:448) at org.apache.lucene.search.IndexSearcher.searchAfter(IndexSearcher.java:243) at org.apache.jackrabbit.oak.plugins.index.lucene.LucenePropertyIndex$1.loadDocs(LucenePropertyIndex.java:434) at org.apache.jackrabbit.oak.plugins.index.lucene.LucenePropertyIndex$1.computeNext(LucenePropertyIndex.java:353)
Background: Many threads are waiting to lock here:
- waiting to lock <0x00007fbd804a9448> (a org.apache.jackrabbit.oak.plugins.index.lucene.IndexTracker) at org.apache.jackrabbit.oak.plugins.index.lucene.IndexTracker.acquireIndexNode(IndexTracker.java:202) public IndexNode acquireIndexNode(String path) { IndexNodeManager index = indices.get(path); IndexNode indexNode = index != null ? index.acquire() : null; if (indexNode != null) { return indexNode; } else { return findIndexNode(path); <<= synchronized method } }
And the thread that is holding that (synchronized) lock on IndexTracker is also waiting to get a write lock here:
"oak-lucene-992" #21559 daemon prio=1 os_prio=0 tid=0x00007ed0280ad000 nid=0x1be8 waiting on condition [0x00007fb96c352000] java.lang.Thread.State: WAITING (parking) at sun.misc.Unsafe.park(Native Method) - parking to wait for <0x00007fbedad77210> (a java.util.concurrent.locks.ReentrantReadWriteLock$NonfairSync) at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175) at java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836) at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireQueued(AbstractQueuedSynchronizer.java:870) at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(AbstractQueuedSynchronizer.java:1199) at java.util.concurrent.locks.ReentrantReadWriteLock$WriteLock.lock(ReentrantReadWriteLock.java:943) at org.apache.jackrabbit.oak.plugins.index.lucene.IndexNodeManager.close(IndexNodeManager.java:174) at org.apache.jackrabbit.oak.plugins.index.lucene.IndexTracker.close(IndexTracker.java:104) - locked <0x00007fbd804a9448> (a org.apache.jackrabbit.oak.plugins.index.lucene.IndexTracker) at org.apache.jackrabbit.oak.plugins.index.lucene.IndexTracker.update(IndexTracker.java:114) - locked <0x00007fbd804a9448> (a org.apache.jackrabbit.oak.plugins.index.lucene.IndexTracker) void close() throws IOException { lock.writeLock().lock(); <<== waiting here try { checkState(!closed); closed = true; } finally { lock.writeLock().unlock(); }
It looks like a read lock is not released. The places where a read lock is acquired:
IndexNode acquire() { lock.readLock().lock(); (released if the method returns null or throws an exception) private void release() { lock.readLock().unlock(); }
acquire() is called in acquireIndexNode:
public IndexNode acquireIndexNode(String path) { IndexNodeManager index = indices.get(path); IndexNode indexNode = index != null ? index.acquire() : null; <<== here if (indexNode != null) { return indexNode; } else { return findIndexNode(path); } } private synchronized IndexNode findIndexNode(String path) { // Retry the lookup from acquireIndexNode now that we're // synchronized. The acquire() call is guaranteed to succeed // since the close() method is also synchronized. IndexNodeManager index = indices.get(path); if (index != null) { IndexNode indexNode = index.acquire(); <<== here return checkNotNull(indexNode); }
This method is called in multiple places (getPlanDescription, getPlans, loadDocs, getSize, getOldFullTextIndexPath, dumpIndexContent, getFieldInfo, getFieldTermPrefixInfo, getIndexedPaths, getIndexStats) but always released in "finally".
There is one thread that seems to hold this read lock:
"10.13.0.73 [1597750655085] GET /mnt/overlay/granite/ui/content/shell/omnisearch/searchresults.html HTTP/1.1" #10953 prio=5 os_prio=0 tid=0x00007fb97000f000 nid=0x8b59 runnable [0x00007fb96c44c000] java.lang.Thread.State: RUNNABLE at org.apache.lucene.search.TopScoreDocCollector$InOrderPagingScoreDocCollector.collect(TopScoreDocCollector.java:85) at org.apache.lucene.search.Scorer.score(Scorer.java:65) at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:621) at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:491) at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:448) at org.apache.lucene.search.IndexSearcher.searchAfter(IndexSearcher.java:243) at org.apache.jackrabbit.oak.plugins.index.lucene.LucenePropertyIndex$1.loadDocs(LucenePropertyIndex.java:434) at org.apache.jackrabbit.oak.plugins.index.lucene.LucenePropertyIndex$1.computeNext(LucenePropertyIndex.java:353) at org.apache.jackrabbit.oak.plugins.index.lucene.LucenePropertyIndex$1.computeNext(LucenePropertyIndex.java:339) at com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractIterator.java:143) at com.google.common.collect.AbstractIterator.hasNext(AbstractIterator.java:138) at org.apache.jackrabbit.oak.plugins.index.lucene.LucenePropertyIndex$LucenePathCursor$1.hasNext(LucenePropertyIndex.java:1718) at com.google.common.collect.Iterators$7.computeNext(Iterators.java:645) at com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractIterator.java:143) at com.google.common.collect.AbstractIterator.hasNext(AbstractIterator.java:138) at org.apache.jackrabbit.oak.plugins.index.Cursors$PathCursor.hasNext(Cursors.java:216) at org.apache.jackrabbit.oak.plugins.index.lucene.LucenePropertyIndex$LucenePathCursor.hasNext(LucenePropertyIndex.java:1751) at org.apache.jackrabbit.oak.query.ast.SelectorImpl.next(SelectorImpl.java:432) at org.apache.jackrabbit.oak.query.QueryImpl$RowIterator.fetchNext(QueryImpl.java:824) at org.apache.jackrabbit.oak.query.QueryImpl$RowIterator.hasNext(QueryImpl.java:851) at com.google.common.collect.Iterators$5.hasNext(Iterators.java:542) at org.apache.jackrabbit.oak.query.FilterIterators$DistinctIterator.fetchNext(FilterIterators.java:141) at org.apache.jackrabbit.oak.query.FilterIterators$DistinctIterator.hasNext(FilterIterators.java:155) at org.apache.jackrabbit.oak.jcr.query.QueryResultImpl$1.fetch(QueryResultImpl.java:98) at org.apache.jackrabbit.oak.jcr.query.QueryResultImpl$1.next(QueryResultImpl.java:123) at org.apache.jackrabbit.oak.jcr.query.QueryResultImpl$1.next(QueryResultImpl.java:78) at org.apache.jackrabbit.oak.jcr.delegate.SessionDelegate$SynchronizedIterator.next(SessionDelegate.java:702) at org.apache.jackrabbit.oak.jcr.query.PrefetchIterator.size(PrefetchIterator.java:135) at org.apache.jackrabbit.oak.jcr.query.QueryResultImpl$3.getSize(QueryResultImpl.java:143) at org.apache.jackrabbit.oak.jcr.query.QueryResultImpl.getRows(QueryResultImpl.java:140)