Uploaded image for project: 'Solr'
  1. Solr
  2. SOLR-17118

Solr deadlock during servlet container start

    XMLWordPrintableJSON

Details

    Description

      In rare cases, Solr can run into a deadlock when started. The servlet container startup thread gets blocked and there's no other thread that could unblock it:

      "main" #1 prio=5 os_prio=0 cpu=5922.39ms elapsed=7490.27s tid=0x00007f637402ae70 nid=0x47 waiting on condition [0x00007f6379488000]
         java.lang.Thread.State: WAITING (parking)
          at jdk.internal.misc.Unsafe.park(java.base@17.0.9/Native Method)
          - parking to wait for  <0x0000000081da8000> (a java.util.concurrent.CountDownLatch$Sync)
          at java.util.concurrent.locks.LockSupport.park(java.base@17.0.9/Unknown Source)
          at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(java.base@17.0.9/Unknown Source)
          at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterruptibly(java.base@17.0.9/Unknown Source)
          at java.util.concurrent.CountDownLatch.await(java.base@17.0.9/Unknown Source)
          at org.apache.solr.servlet.CoreContainerProvider$ContextInitializationKey.waitForReadyService(CoreContainerProvider.java:523)
          at org.apache.solr.servlet.CoreContainerProvider$ServiceHolder.getService(CoreContainerProvider.java:562)
          at org.apache.solr.servlet.SolrDispatchFilter.init(SolrDispatchFilter.java:148)
          at org.eclipse.jetty.servlet.FilterHolder.initialize(FilterHolder.java:133)
          at org.eclipse.jetty.servlet.ServletHandler.lambda$initialize$2(ServletHandler.java:725)
          at org.eclipse.jetty.servlet.ServletHandler$$Lambda$315/0x00007f62fc2674b8.accept(Unknown Source)
          at java.util.ArrayList$ArrayListSpliterator.forEachRemaining(java.base@17.0.9/Unknown Source)
          at java.util.stream.Streams$ConcatSpliterator.forEachRemaining(java.base@17.0.9/Unknown Source)
          at java.util.stream.ReferencePipeline$Head.forEach(java.base@17.0.9/Unknown Source)
          at org.eclipse.jetty.servlet.ServletHandler.initialize(ServletHandler.java:749)
          at org.eclipse.jetty.servlet.ServletContextHandler.startContext(ServletContextHandler.java:392) 
      

      ContextInitializationKey.waitForReadyService should have been unblocked by CoreContainerProvider#init, which is calling ServiceHolder#setService. This should work because CoreContainerProvider#init is always called before SolrDispatchFilter#init (ServletContextListeners are initialized before Filters).

      But there's a problem: CoreContainerProvider#init stores the ContextInitializationKey and the mapped ServiceHolder in CoreContainerProvider#services, and that's a WeakHashMap:

            services 
                .computeIfAbsent(new ContextInitializationKey(servletContext), ServiceHolder::new) 
                .setService(this); 
      

      The key is not referenced anywhere else, which makes the mapping a candidate for garbage collection. The ServiceHolder value also does not reference the key anymore, because #setService cleared the reference.

      With bad luck, the mapping is already gone from the WeakHashMap before SolrDispatchFilter#init tries to retrieve it with CoreContainerProvider#serviceForContext. And that method will then create a new ContextInitializationKey and ServiceHolder, which is then used for #waitForReadyService. But such a new ContextInitializationKey has never received a #makeReady call, and #waitForReadyService will block forever.

      Attachments

        Issue Links

          Activity

            People

              dsmiley David Smiley
              ahubold Andreas Hubold
              Votes:
              0 Vote for this issue
              Watchers:
              6 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Time Tracking

                  Estimated:
                  Original Estimate - Not Specified
                  Not Specified
                  Remaining:
                  Remaining Estimate - 0h
                  0h
                  Logged:
                  Time Spent - 2h 50m
                  2h 50m