Solr
  1. Solr
  2. SOLR-1730

Solr fails to start if QueryElevationComponent config is missing

    Details

      Description

      QueryElevationComponent tries to do preload some data if its config file does not exist:

              if (!exists){
                // preload the first data
                RefCounted<SolrIndexSearcher> searchHolder = null;
                try {
                  searchHolder = core.getNewestSearcher(false);
                  IndexReader reader = searchHolder.get().getReader();
                  getElevationMap( reader, core );
                } finally {
                  if (searchHolder != null) searchHolder.decref();
                }
              }
      

      This does not work though, as asking for the newest searcher causes a request to be submitted to Solr before its ready to handle it:

           [java] SEVERE: java.lang.NullPointerException
           [java] 	at org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:173)
           [java] 	at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:131)
           [java] 	at org.apache.solr.core.SolrCore.execute(SolrCore.java:1317)
           [java] 	at org.apache.solr.core.QuerySenderListener.newSearcher(QuerySenderListener.java:52)
           [java] 	at org.apache.solr.core.SolrCore$3.call(SolrCore.java:1147)
           [java] 	at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
           [java] 	at java.util.concurrent.FutureTask.run(FutureTask.java:138)
      

      The SearchHandler has not yet been core informed (as the QueryElevationComponent causes this as its getting core informed right before the SearchHandler) and so its components arraylist is still null.

      1. SOLR-1730.patch
        15 kB
        Grant Ingersoll
      2. SOLR-1730.patch
        10 kB
        Grant Ingersoll

        Issue Links

          Activity

          Hide
          Grant Ingersoll added a comment -

          Do we have a better way of checking exceptions besides comparing the messages? I can check the cause is a SolrException, but it seems the only way to check the actual proper exception is to compare the message:

          if (e.getCause() instanceof SolrException && e.getCause().getCause().getMessage().equals("Error initializing QueryElevationComponent.")){
          
          Show
          Grant Ingersoll added a comment - Do we have a better way of checking exceptions besides comparing the messages? I can check the cause is a SolrException, but it seems the only way to check the actual proper exception is to compare the message: if (e.getCause() instanceof SolrException && e.getCause().getCause().getMessage().equals( "Error initializing QueryElevationComponent." )){
          Hide
          Mark Miller added a comment -

          Just noticed the expected exception is not ignored in ant test output.

          Show
          Mark Miller added a comment - Just noticed the expected exception is not ignored in ant test output.
          Hide
          Grant Ingersoll added a comment -

          Fixed on trunk and 3.x

          Show
          Grant Ingersoll added a comment - Fixed on trunk and 3.x
          Hide
          Mark Miller added a comment -

          I have not had a chance to apply it and look thoroughly, but I read the patch and comment earlier today and it all looks good to me.

          Show
          Mark Miller added a comment - I have not had a chance to apply it and look thoroughly, but I read the patch and comment earlier today and it all looks good to me.
          Hide
          Grant Ingersoll added a comment -

          I'm pretty comfortable with this solution and would like to commit in the coming day or two, if others want to review it.

          Show
          Grant Ingersoll added a comment - I'm pretty comfortable with this solution and would like to commit in the coming day or two, if others want to review it.
          Hide
          Grant Ingersoll added a comment -

          All tests pass for me locally.

          Show
          Grant Ingersoll added a comment - All tests pass for me locally.
          Hide
          Grant Ingersoll added a comment -

          This patch should fix the problem. I did a couple of things:

          1. Addressed the solrconfig issue Yonik raised (i.e. use a sys property)
          2. I logged that the core can't be created.
          3. If there is only 1 core being created, then this throws an exception up and out of Solr to the container. Based on the docs, it seems different containers will deal with this as they see fit. Jetty simply displays an error message.
          4. I marked the exception coming out of QEC as not logged yet.
          5. In the SolrCore initialization code, I changed from only catching IOException to catching Throwable. I also then release the latch and close down any resources the core has allocated so far. I had to release the latch in the catch block there otherwise the ExecutorService can't shutdown because it is blocked on the latch.

          I'm running full tests now.

          Show
          Grant Ingersoll added a comment - This patch should fix the problem. I did a couple of things: 1. Addressed the solrconfig issue Yonik raised (i.e. use a sys property) 2. I logged that the core can't be created. 3. If there is only 1 core being created, then this throws an exception up and out of Solr to the container. Based on the docs, it seems different containers will deal with this as they see fit. Jetty simply displays an error message. 4. I marked the exception coming out of QEC as not logged yet. 5. In the SolrCore initialization code, I changed from only catching IOException to catching Throwable. I also then release the latch and close down any resources the core has allocated so far. I had to release the latch in the catch block there otherwise the ExecutorService can't shutdown because it is blocked on the latch. I'm running full tests now.
          Hide
          Grant Ingersoll added a comment -

          The NPE is due to the fact that we initListeners() then we call getSearcher() which creates the new searcher and registers a Future/Callable on those listeners passing in "this" (i.e. the partially constructed core that is about to fail), then later, when the core fails, there is still an thread/future/callable waiting to fire off the newSearcher event which it does as soon as the CountDownLatch is released. Little does it know, the core is actually dead. I don't particularly think we need to fix this other than to perhaps document it here, as I think, since things are undefined at this point b/c the core is dead, that we shouldn't care too much about these side consequences.

          Show
          Grant Ingersoll added a comment - The NPE is due to the fact that we initListeners() then we call getSearcher() which creates the new searcher and registers a Future/Callable on those listeners passing in "this" (i.e. the partially constructed core that is about to fail), then later, when the core fails, there is still an thread/future/callable waiting to fire off the newSearcher event which it does as soon as the CountDownLatch is released. Little does it know, the core is actually dead. I don't particularly think we need to fix this other than to perhaps document it here, as I think, since things are undefined at this point b/c the core is dead, that we shouldn't care too much about these side consequences.
          Hide
          Yonik Seeley added a comment -

          Is there an easy way we could avoid yet another solrconfig.xml file? Perhaps making the elevate file a system property in the existing solrconfig-elevate.xml and just change it for the "bad" test?

          Show
          Yonik Seeley added a comment - Is there an easy way we could avoid yet another solrconfig.xml file? Perhaps making the elevate file a system property in the existing solrconfig-elevate.xml and just change it for the "bad" test?
          Hide
          Grant Ingersoll added a comment -

          OK, so per IRC discussion w/ Mark and looking at the code, this exception actually causes the core to fail to construct and be registered.

          It seems to me, then, that the way forward is that if this is the only core (or would be the only core) then Solr should fail and exit. If there are other cores, it should log that the core for XXXX cannot be created and then proceed. One core failure should not cause the others to be out of service.

          Show
          Grant Ingersoll added a comment - OK, so per IRC discussion w/ Mark and looking at the code, this exception actually causes the core to fail to construct and be registered. It seems to me, then, that the way forward is that if this is the only core (or would be the only core) then Solr should fail and exit. If there are other cores, it should log that the core for XXXX cannot be created and then proceed. One core failure should not cause the others to be out of service.
          Hide
          Grant Ingersoll added a comment -

          The SearchHandler has not yet been core informed (as the QueryElevationComponent causes this as its getting core informed right before the SearchHandler) and so its components arraylist is still null.

          I believe this is no longer the case, at least in 4. I think this all works correctly other than what to do if an inform() actually fails. For QEC, it's probably enough to log and silently not elevate anything, but I'm not sure if that makes sense with other components

          Show
          Grant Ingersoll added a comment - The SearchHandler has not yet been core informed (as the QueryElevationComponent causes this as its getting core informed right before the SearchHandler) and so its components arraylist is still null. I believe this is no longer the case, at least in 4. I think this all works correctly other than what to do if an inform() actually fails. For QEC, it's probably enough to log and silently not elevate anything, but I'm not sure if that makes sense with other components
          Hide
          Grant Ingersoll added a comment -

          A little bit of progress, namely in setting up some tests for this as well as fixing the logging of the main exception.

          The BadComponentTest shows the error (as well as some issue with either the harness or core itself when it comes to bad components). The QEC is just the symptom of what's wrong here, as all Components produce similar errors if the inform() fails. The real question is, what should we do about it, since inform is called on reloads, not just at startup it gets a bit trickier with the fail early approach that one often wants.

          Show
          Grant Ingersoll added a comment - A little bit of progress, namely in setting up some tests for this as well as fixing the logging of the main exception. The BadComponentTest shows the error (as well as some issue with either the harness or core itself when it comes to bad components). The QEC is just the symptom of what's wrong here, as all Components produce similar errors if the inform() fails. The real question is, what should we do about it, since inform is called on reloads, not just at startup it gets a bit trickier with the fail early approach that one often wants.
          Hide
          Grant Ingersoll added a comment -

          Note, this NPE actually happens if any component throws an exception in inform(), it seems.

          Show
          Grant Ingersoll added a comment - Note, this NPE actually happens if any component throws an exception in inform(), it seems.
          Hide
          Grant Ingersoll added a comment -

          The cause is due to the wrapping SolrException on line 217, which marks it as logged by true

          Show
          Grant Ingersoll added a comment - The cause is due to the wrapping SolrException on line 217, which marks it as logged by true
          Hide
          Grant Ingersoll added a comment -

          In the Solr example case, it appears the exception is getting swallowed somewhere along the line. It isn't showing up in the logs, but for some reason, it thinks it has already been logged.

          Show
          Grant Ingersoll added a comment - In the Solr example case, it appears the exception is getting swallowed somewhere along the line. It isn't showing up in the logs, but for some reason, it thinks it has already been logged.
          Hide
          Grant Ingersoll added a comment -

          AFAICT, the problem seems to hinge on the fact that we assume that if the file doesn't exist in the conf dir that it must exist in the data dir.

          Show
          Grant Ingersoll added a comment - AFAICT, the problem seems to hinge on the fact that we assume that if the file doesn't exist in the conf dir that it must exist in the data dir.
          Hide
          Grant Ingersoll added a comment -

          I can also see this would get screwed up in Zookeeper mode if the elevate file didn't exist.

          Show
          Grant Ingersoll added a comment - I can also see this would get screwed up in Zookeeper mode if the elevate file didn't exist.
          Hide
          Grant Ingersoll added a comment -

          If I move elevate.xml out of example/solr/conf, I get:

          Dec 6, 2011 3:59:17 PM org.apache.solr.common.SolrException log
          SEVERE: java.lang.NullPointerException
          at org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:167)
          at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:129)
          at org.apache.solr.core.SolrCore.execute(SolrCore.java:1474)
          at org.apache.solr.core.QuerySenderListener.newSearcher(QuerySenderListener.java:59)
          at org.apache.solr.core.SolrCore$3.call(SolrCore.java:1251)
          at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
          at java.util.concurrent.FutureTask.run(FutureTask.java:138)
          at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
          at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
          at java.lang.Thread.run(Thread.java:680)

          But I don't see the exception in the QEC that should be thrown by line 191. If I setup a test locally using the solrconfig-elevate.xml, then I can get the Solr exception.

          Show
          Grant Ingersoll added a comment - If I move elevate.xml out of example/solr/conf, I get: Dec 6, 2011 3:59:17 PM org.apache.solr.common.SolrException log SEVERE: java.lang.NullPointerException at org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:167) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:129) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1474) at org.apache.solr.core.QuerySenderListener.newSearcher(QuerySenderListener.java:59) at org.apache.solr.core.SolrCore$3.call(SolrCore.java:1251) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) at java.util.concurrent.FutureTask.run(FutureTask.java:138) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:680) But I don't see the exception in the QEC that should be thrown by line 191. If I setup a test locally using the solrconfig-elevate.xml, then I can get the Solr exception.
          Hide
          Matthew Buckett added a comment -

          I've been seeing a similar stack trace:

          Oct 31, 2011 2:06:55 PM org.apache.solr.common.SolrException log
          SEVERE: java.lang.NullPointerException
          at org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:172)
          at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:129)
          at org.apache.solr.core.SolrCore.execute(SolrCore.java:1368)
          at org.apache.solr.core.QuerySenderListener.newSearcher(QuerySenderListener.java:54)
          at org.apache.solr.core.SolrCore$3.call(SolrCore.java:1177)
          at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
          at java.util.concurrent.FutureTask.run(FutureTask.java:138)
          at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
          at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
          at java.lang.Thread.run(Thread.java:680)

          and when trying to access solr the error is:

          QueryElevationComponent missing config file: 'elevate.xml either: /Applications/Eclipse Indigo/Eclipse (Indigo).app/Contents/MacOS/solr/./conf/elevate.xml or /Applications/Eclipse Indigo/Eclipse (Indigo).app/Contents/MacOS/solr/./data/elevate.xml must exist, but not both.

          Show
          Matthew Buckett added a comment - I've been seeing a similar stack trace: Oct 31, 2011 2:06:55 PM org.apache.solr.common.SolrException log SEVERE: java.lang.NullPointerException at org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:172) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:129) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1368) at org.apache.solr.core.QuerySenderListener.newSearcher(QuerySenderListener.java:54) at org.apache.solr.core.SolrCore$3.call(SolrCore.java:1177) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) at java.util.concurrent.FutureTask.run(FutureTask.java:138) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:680) and when trying to access solr the error is: QueryElevationComponent missing config file: 'elevate.xml either: /Applications/Eclipse Indigo/Eclipse (Indigo).app/Contents/MacOS/solr/./conf/elevate.xml or /Applications/Eclipse Indigo/Eclipse (Indigo).app/Contents/MacOS/solr/./data/elevate.xml must exist, but not both.
          Hide
          Robert Muir added a comment -

          3.4 -> 3.5

          Show
          Robert Muir added a comment - 3.4 -> 3.5
          Hide
          Robert Muir added a comment -

          Bulk move 3.2 -> 3.3

          Show
          Robert Muir added a comment - Bulk move 3.2 -> 3.3
          Hide
          Grant Ingersoll added a comment -

          I don't understand why it needs to even do this, what benefit is there to the QEC if it doesn't have an elevation file? Seems like it should just disable itself or throw an exception and fail.

          Show
          Grant Ingersoll added a comment - I don't understand why it needs to even do this, what benefit is there to the QEC if it doesn't have an elevation file? Seems like it should just disable itself or throw an exception and fail.
          Hide
          Hoss Man added a comment -

          Bulk updating 240 Solr issues to set the Fix Version to "next" per the process outlined in this email...

          http://mail-archives.apache.org/mod_mbox/lucene-dev/201005.mbox/%3Calpine.DEB.1.10.1005251052040.24672@radix.cryptio.net%3E

          Selection criteria was "Unresolved" with a Fix Version of 1.5, 1.6, 3.1, or 4.0. email notifications were suppressed.

          A unique token for finding these 240 issues in the future: hossversioncleanup20100527

          Show
          Hoss Man added a comment - Bulk updating 240 Solr issues to set the Fix Version to "next" per the process outlined in this email... http://mail-archives.apache.org/mod_mbox/lucene-dev/201005.mbox/%3Calpine.DEB.1.10.1005251052040.24672@radix.cryptio.net%3E Selection criteria was "Unresolved" with a Fix Version of 1.5, 1.6, 3.1, or 4.0. email notifications were suppressed. A unique token for finding these 240 issues in the future: hossversioncleanup20100527

            People

            • Assignee:
              Grant Ingersoll
              Reporter:
              Mark Miller
            • Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development