Solr
  1. Solr
  2. SOLR-8068

NPE in SolrDispatchFilter.authenticateRequest

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: 5.3
    • Fix Version/s: 5.4
    • Component/s: None
    • Labels:
      None

      Description

      Suddenly, one of our Solr 5.3 nodes responds with the following trace when i send a delete all query via SolrJ.

      java.lang.NullPointerException
              at org.apache.solr.servlet.SolrDispatchFilter.authenticateRequest(SolrDispatchFilter.java:237)
              at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:186)
              at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:179)
              at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1652)
              at org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:585)
              at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:143)
              at org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:577)
              at org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:223)
              at org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1127)
              at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:515)
              at org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:185)
              at org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1061)
              at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141)
              at org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:215)
              at org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:110)
              at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:97)
              at org.eclipse.jetty.server.Server.handle(Server.java:499)
              at org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:310)
              at org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:257)
              at org.eclipse.jetty.io.AbstractConnection$2.run(AbstractConnection.java:540)
              at org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:635)
              at org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:555)
      
      
      1. SOLR-8068.patch
        1 kB
        Ishan Chattopadhyaya
      2. SOLR-8068.patch
        1 kB
        Anshum Gupta
      3. solr-core-5.3.0-SNAPSHOT.jar
        3.48 MB
        Markus Jelsma

        Issue Links

          Activity

          Hide
          Shalin Shekhar Mangar added a comment -

          Is this reproducible? Is authentication enabled or disabled on this cluster?

          Show
          Shalin Shekhar Mangar added a comment - Is this reproducible? Is authentication enabled or disabled on this cluster?
          Hide
          Markus Jelsma added a comment -

          Hello - it is reproducible in the sense that it keeps happening on some nodes since a few days, i have no clue. Authentication is disabled. We have multiple Solr collections at multiple locations, all get indexed via SolrJ 5.3. Only one collection returns this error, on both locations. Strange enough, the NPE is logged on the deleting/indexing client. Both collections (same collection, different location) only logs the following:

          ERROR - 2015-09-18 12:53:12.606; [c:relations s:shard1 r:core_node1 x:relations_shard1_replica1] org.apache.solr.request.SolrRequestInfo; Previous SolrRequestInfo was not closed!  req=waitSearcher=true&commit=true&softCommit=false&wt=javabin&version=2
          ERROR - 2015-09-18 12:53:12.607; [c:relations s:shard1 r:core_node1 x:relations_shard1_replica1] org.apache.solr.request.SolrRequestInfo; prev == info : false
          ERROR - 2015-09-18 12:54:39.892; [c:relations s:shard1 r:core_node1 x:relations_shard1_replica1] org.apache.solr.request.SolrRequestInfo; Previous SolrRequestInfo was not closed!  req=waitSearcher=true&commit=true&softCommit=false&wt=javabin&version=2
          ERROR - 2015-09-18 12:54:39.893; [c:relations s:shard1 r:core_node1 x:relations_shard1_replica1] org.apache.solr.request.SolrRequestInfo; prev == info : false
          

          It makes little sense so far.

          Show
          Markus Jelsma added a comment - Hello - it is reproducible in the sense that it keeps happening on some nodes since a few days, i have no clue. Authentication is disabled. We have multiple Solr collections at multiple locations, all get indexed via SolrJ 5.3. Only one collection returns this error, on both locations. Strange enough, the NPE is logged on the deleting/indexing client. Both collections (same collection, different location) only logs the following: ERROR - 2015-09-18 12:53:12.606; [c:relations s:shard1 r:core_node1 x:relations_shard1_replica1] org.apache.solr.request.SolrRequestInfo; Previous SolrRequestInfo was not closed! req=waitSearcher= true &commit= true &softCommit= false &wt=javabin&version=2 ERROR - 2015-09-18 12:53:12.607; [c:relations s:shard1 r:core_node1 x:relations_shard1_replica1] org.apache.solr.request.SolrRequestInfo; prev == info : false ERROR - 2015-09-18 12:54:39.892; [c:relations s:shard1 r:core_node1 x:relations_shard1_replica1] org.apache.solr.request.SolrRequestInfo; Previous SolrRequestInfo was not closed! req=waitSearcher= true &commit= true &softCommit= false &wt=javabin&version=2 ERROR - 2015-09-18 12:54:39.893; [c:relations s:shard1 r:core_node1 x:relations_shard1_replica1] org.apache.solr.request.SolrRequestInfo; prev == info : false It makes little sense so far.
          Hide
          Anshum Gupta added a comment -

          This is strange. The only thing that could be null there is cores, which can happen when the corecontainer is shutting down and still ends up accepting a request. The SDF.init() method is guaranteed to be ordered before the first time doRequest is invoked, though I couldn't find any information about the contract regarding completion of init <-> first doFilter call.

          Markus Jelsma Can you confirm if the node was starting up or shutting down at that moment?

          Show
          Anshum Gupta added a comment - This is strange. The only thing that could be null there is cores , which can happen when the corecontainer is shutting down and still ends up accepting a request. The SDF.init() method is guaranteed to be ordered before the first time doRequest is invoked, though I couldn't find any information about the contract regarding completion of init <-> first doFilter call. Markus Jelsma Can you confirm if the node was starting up or shutting down at that moment?
          Hide
          Markus Jelsma added a comment -

          Hello Anshum - the nodes were not starting or shutting down at that particular moment. Also, another collection on the same node does not have this problem.

          Show
          Markus Jelsma added a comment - Hello Anshum - the nodes were not starting or shutting down at that particular moment. Also, another collection on the same node does not have this problem.
          Hide
          Anshum Gupta added a comment -

          Hmm interesting. I can't spot anything else that might cause this, other than the core container itself being null there. Just to confirm, that is 5.3.0, right?

          Show
          Anshum Gupta added a comment - Hmm interesting. I can't spot anything else that might cause this, other than the core container itself being null there. Just to confirm, that is 5.3.0, right?
          Hide
          Markus Jelsma added a comment -

          Yes, all nodes are on 5.3.0.

          Show
          Markus Jelsma added a comment - Yes, all nodes are on 5.3.0.
          Hide
          Alexandre Rafalovitch added a comment -

          I might have a reproduction from clean install in SOLR-8073 (on Windows, though)

          Show
          Alexandre Rafalovitch added a comment - I might have a reproduction from clean install in SOLR-8073 (on Windows, though)
          Hide
          Anshum Gupta added a comment -

          Do you mind just checking if 5.2.1 also has the same problem? Also, if you share the steps, I'll just try reproducing it on my mac.

          Show
          Anshum Gupta added a comment - Do you mind just checking if 5.2.1 also has the same problem? Also, if you share the steps, I'll just try reproducing it on my mac.
          Hide
          Alexandre Rafalovitch added a comment -

          Seems to work in 5.2.1. The exact steps are in the linked JIRA.

          This must be related to the new security introduced in 5.3. But it seems that solutions in 5.3.1 do not work for this use case.

          Show
          Alexandre Rafalovitch added a comment - Seems to work in 5.2.1. The exact steps are in the linked JIRA. This must be related to the new security introduced in 5.3. But it seems that solutions in 5.3.1 do not work for this use case.
          Hide
          Anshum Gupta added a comment -

          As per the java doc here: http://docs.oracle.com/javaee/7/api/javax/servlet/Filter.html#init-javax.servlet.FilterConfig- the init should complete before the first call to doFilter. In which case, the core container should've been initialized.

          In case there was some problem during initialization, you should've seen it in the logs. If not, the only other time that cores is written is during destroy.

          Now, considering that you saw cores to be null at a time between those, it's strange. We can always add a check (like below) in authenticateRequest but I'm would want to really make sure that it's not something else that we're missing here:

          if(cores == null)
                throw new SolrException(ErrorCode.SERVICE_UNAVAILABLE,
                    "Error during request processing. CoreContainer is not available");
              
          AuthenticationPlugin authenticationPlugin = cores.getAuthenticationPlugin();    
          
          Show
          Anshum Gupta added a comment - As per the java doc here: http://docs.oracle.com/javaee/7/api/javax/servlet/Filter.html#init-javax.servlet.FilterConfig- the init should complete before the first call to doFilter. In which case, the core container should've been initialized. In case there was some problem during initialization, you should've seen it in the logs. If not, the only other time that cores is written is during destroy . Now, considering that you saw cores to be null at a time between those, it's strange. We can always add a check (like below) in authenticateRequest but I'm would want to really make sure that it's not something else that we're missing here: if (cores == null ) throw new SolrException(ErrorCode.SERVICE_UNAVAILABLE, "Error during request processing. CoreContainer is not available" ); AuthenticationPlugin authenticationPlugin = cores.getAuthenticationPlugin();
          Hide
          Noble Paul added a comment -

          Is this on windows? There is a similar stacktrace on SOLR-8073

          Show
          Noble Paul added a comment - Is this on windows? There is a similar stacktrace on SOLR-8073
          Hide
          Markus Jelsma added a comment -

          Hello - no, this is on Debian Linux running a OpenJDK 1.8.0_66 JVM.

          Show
          Markus Jelsma added a comment - Hello - no, this is on Debian Linux running a OpenJDK 1.8.0_66 JVM.
          Hide
          Anshum Gupta added a comment -

          Markus Jelsma would it be possible for you to share the logs from the machine where you saw this happening?

          Show
          Anshum Gupta added a comment - Markus Jelsma would it be possible for you to share the logs from the machine where you saw this happening?
          Hide
          Anshum Gupta added a comment -

          Patch that does what I mentioned in my comment. It doesn't make the problem go away but doesn't throw an NPE, instead throws an exception and logs.

          This really shouldn't ever happen unless maybe, during an edge case when the node is shutting down.

          Show
          Anshum Gupta added a comment - Patch that does what I mentioned in my comment. It doesn't make the problem go away but doesn't throw an NPE, instead throws an exception and logs. This really shouldn't ever happen unless maybe, during an edge case when the node is shutting down.
          Hide
          Ishan Chattopadhyaya added a comment -

          IIRC, Timothy Potter hit upon this once during unit tests; user requests came through even before the servlet was initialized. Here's a patch that adds checks for core container being null.
          However, I'm still not sure how this happened midway through the use of Solr.

          Show
          Ishan Chattopadhyaya added a comment - IIRC, Timothy Potter hit upon this once during unit tests; user requests came through even before the servlet was initialized. Here's a patch that adds checks for core container being null. However, I'm still not sure how this happened midway through the use of Solr.
          Hide
          Ishan Chattopadhyaya added a comment -

          Anshum Gupta Oops, you beat me to the patch. Feel free to ignore mine.

          Show
          Ishan Chattopadhyaya added a comment - Anshum Gupta Oops, you beat me to the patch. Feel free to ignore mine.
          Hide
          Markus Jelsma added a comment -

          Anshum Gupta

          INFO  - 2015-09-21 15:57:47.311; [c:relations s:shard1 r:core_node1 x:relations_shard1_replica1] org.apache.solr.update.processor.LogUpdateProcessor; [relations_shard1_replica1] webapp=/solr path=/update params={wt=javabin&version=2} {deleteByQuery=*:* (-1512939000366628864)} 0 462
          INFO  - 2015-09-21 15:57:49.278; [c:relations s:shard1 r:core_node1 x:relations_shard1_replica1] org.apache.solr.update.processor.LogUpdateProcessor; [relations_shard1_replica1] webapp=/solr path=/update params={wt=javabin&version=2} {add=[1dd0b2b0-d1ed-4351-8525-b8bd148f25c9 (1512939000851070976), f069905c-76ac-4a58-8bdf-9d26cecba258 (1512939000916082688), 3e014831-c77d-4b9c-a022-3b0d8ece28ea (1512939000920276992), 55504b5e-f652-49c2-9451-fde598106702 (1512939000924471296), 13ae8cbd-2d30-4b64-845c-a77910cb1982 (1512939000927617024), d59810cb-555f-4f12-9798-bbccdf4b18f8 (1512939000932859904), c5ad5cfc-fcd2-4612-97e3-44cdf70ee865 (1512939000936005632), d3dfc240-96a3-40ee-b182-4553ed5a2aea (1512939000939151360), 12fec088-f181-430d-ae09-c3eee4a2f23c (1512939000942297088), 9a88ed71-cdee-4ffc-b38f-cb84820d9a8a (1512939000944394240), ... (4186 adds)]} 0 1882
          INFO  - 2015-09-21 15:57:49.298; [c:relations s:shard1 r:core_node1 x:relations_shard1_replica1] org.apache.solr.update.DirectUpdateHandler2; start commit{,optimize=false,openSearcher=true,waitSearcher=true,expungeDeletes=false,softCommit=false,prepareCommit=false}
          INFO  - 2015-09-21 15:57:49.893; [c:relations s:shard1 r:core_node1 x:relations_shard1_replica1] org.apache.solr.core.SolrDeletionPolicy; SolrDeletionPolicy.onCommit: commits: num=2
                  commit{dir=NRTCachingDirectory(MMapDirectory@/var/lib/solr/relations_shard1_replica1/data/index lockFactory=org.apache.lucene.store.NativeFSLockFactory@bbed33; maxCacheMB=48.0 maxMergeSizeMB=4.0),segFN=segments_kh,generation=737}
                  commit{dir=NRTCachingDirectory(MMapDirectory@/var/lib/solr/relations_shard1_replica1/data/index lockFactory=org.apache.lucene.store.NativeFSLockFactory@bbed33; maxCacheMB=48.0 maxMergeSizeMB=4.0),segFN=segments_ki,generation=738}
          INFO  - 2015-09-21 15:57:49.895; [c:relations s:shard1 r:core_node1 x:relations_shard1_replica1] org.apache.solr.core.SolrDeletionPolicy; newest commit generation = 738
          INFO  - 2015-09-21 15:57:49.906; [c:relations s:shard1 r:core_node1 x:relations_shard1_replica1] org.apache.solr.search.SolrIndexSearcher; Opening Searcher@6e3d2e82[relations_shard1_replica1] main
          INFO  - 2015-09-21 15:57:49.910; [c:relations s:shard1 r:core_node1 x:relations_shard1_replica1] org.apache.solr.core.QuerySenderListener; QuerySenderListener sending requests to Searcher@6e3d2e82[relations_shard1_replica1] main{ExitableDirectoryReader(UninvertingDirectoryReader(Uninverting(_kh(5.3.0):C4186)))}
          ERROR - 2015-09-21 15:57:49.915; [c:relations s:shard1 r:core_node1 x:relations_shard1_replica1] org.apache.solr.request.SolrRequestInfo; Previous SolrRequestInfo was not closed!  req=waitSearcher=true&commit=true&softCommit=false&wt=javabin&version=2
          ERROR - 2015-09-21 15:57:49.916; [c:relations s:shard1 r:core_node1 x:relations_shard1_replica1] org.apache.solr.request.SolrRequestInfo; prev == info : false
          INFO  - 2015-09-21 15:57:49.919; [c:relations s:shard1 r:core_node1 x:relations_shard1_replica1] org.apache.solr.core.SolrCore; [relations_shard1_replica1] webapp=null path=null params={q=*:*&distrib=false&event=newSearcher} hits=4186 status=0 QTime=3 
          INFO  - 2015-09-21 15:57:49.926; [c:relations s:shard1 r:core_node1 x:relations_shard1_replica1] org.apache.solr.core.QuerySenderListener; QuerySenderListener done.
          INFO  - 2015-09-21 15:57:49.937; [c:relations s:shard1 r:core_node1 x:relations_shard1_replica1] org.apache.solr.core.SolrCore; [relations_shard1_replica1] Registered new searcher Searcher@6e3d2e82[relations_shard1_replica1] main{ExitableDirectoryReader(UninvertingDirectoryReader(Uninverting(_kh(5.3.0):C4186)))}
          INFO  - 2015-09-21 15:57:49.942; [c:relations s:shard1 r:core_node1 x:relations_shard1_replica1] org.apache.solr.update.DirectUpdateHandler2; end_commit_flush
          INFO  - 2015-09-21 15:57:49.952; [c:relations s:shard1 r:core_node1 x:relations_shard1_replica1] org.apache.solr.update.processor.LogUpdateProcessor; [relations_shard1_replica1] webapp=/solr path=/update params={waitSearcher=true&commit=true&softCommit=false&wt=javabin&version=2} {commit=} 0 657
          
          Show
          Markus Jelsma added a comment - Anshum Gupta INFO - 2015-09-21 15:57:47.311; [c:relations s:shard1 r:core_node1 x:relations_shard1_replica1] org.apache.solr.update.processor.LogUpdateProcessor; [relations_shard1_replica1] webapp=/solr path=/update params={wt=javabin&version=2} {deleteByQuery=*:* (-1512939000366628864)} 0 462 INFO - 2015-09-21 15:57:49.278; [c:relations s:shard1 r:core_node1 x:relations_shard1_replica1] org.apache.solr.update.processor.LogUpdateProcessor; [relations_shard1_replica1] webapp=/solr path=/update params={wt=javabin&version=2} {add=[1dd0b2b0-d1ed-4351-8525-b8bd148f25c9 (1512939000851070976), f069905c-76ac-4a58-8bdf-9d26cecba258 (1512939000916082688), 3e014831-c77d-4b9c-a022-3b0d8ece28ea (1512939000920276992), 55504b5e-f652-49c2-9451-fde598106702 (1512939000924471296), 13ae8cbd-2d30-4b64-845c-a77910cb1982 (1512939000927617024), d59810cb-555f-4f12-9798-bbccdf4b18f8 (1512939000932859904), c5ad5cfc-fcd2-4612-97e3-44cdf70ee865 (1512939000936005632), d3dfc240-96a3-40ee-b182-4553ed5a2aea (1512939000939151360), 12fec088-f181-430d-ae09-c3eee4a2f23c (1512939000942297088), 9a88ed71-cdee-4ffc-b38f-cb84820d9a8a (1512939000944394240), ... (4186 adds)]} 0 1882 INFO - 2015-09-21 15:57:49.298; [c:relations s:shard1 r:core_node1 x:relations_shard1_replica1] org.apache.solr.update.DirectUpdateHandler2; start commit{,optimize= false ,openSearcher= true ,waitSearcher= true ,expungeDeletes= false ,softCommit= false ,prepareCommit= false } INFO - 2015-09-21 15:57:49.893; [c:relations s:shard1 r:core_node1 x:relations_shard1_replica1] org.apache.solr.core.SolrDeletionPolicy; SolrDeletionPolicy.onCommit: commits: num=2 commit{dir=NRTCachingDirectory(MMapDirectory@/ var /lib/solr/relations_shard1_replica1/data/index lockFactory=org.apache.lucene.store.NativeFSLockFactory@bbed33; maxCacheMB=48.0 maxMergeSizeMB=4.0),segFN=segments_kh,generation=737} commit{dir=NRTCachingDirectory(MMapDirectory@/ var /lib/solr/relations_shard1_replica1/data/index lockFactory=org.apache.lucene.store.NativeFSLockFactory@bbed33; maxCacheMB=48.0 maxMergeSizeMB=4.0),segFN=segments_ki,generation=738} INFO - 2015-09-21 15:57:49.895; [c:relations s:shard1 r:core_node1 x:relations_shard1_replica1] org.apache.solr.core.SolrDeletionPolicy; newest commit generation = 738 INFO - 2015-09-21 15:57:49.906; [c:relations s:shard1 r:core_node1 x:relations_shard1_replica1] org.apache.solr.search.SolrIndexSearcher; Opening Searcher@6e3d2e82[relations_shard1_replica1] main INFO - 2015-09-21 15:57:49.910; [c:relations s:shard1 r:core_node1 x:relations_shard1_replica1] org.apache.solr.core.QuerySenderListener; QuerySenderListener sending requests to Searcher@6e3d2e82[relations_shard1_replica1] main{ExitableDirectoryReader(UninvertingDirectoryReader(Uninverting(_kh(5.3.0):C4186)))} ERROR - 2015-09-21 15:57:49.915; [c:relations s:shard1 r:core_node1 x:relations_shard1_replica1] org.apache.solr.request.SolrRequestInfo; Previous SolrRequestInfo was not closed! req=waitSearcher= true &commit= true &softCommit= false &wt=javabin&version=2 ERROR - 2015-09-21 15:57:49.916; [c:relations s:shard1 r:core_node1 x:relations_shard1_replica1] org.apache.solr.request.SolrRequestInfo; prev == info : false INFO - 2015-09-21 15:57:49.919; [c:relations s:shard1 r:core_node1 x:relations_shard1_replica1] org.apache.solr.core.SolrCore; [relations_shard1_replica1] webapp= null path= null params={q=*:*&distrib= false &event=newSearcher} hits=4186 status=0 QTime=3 INFO - 2015-09-21 15:57:49.926; [c:relations s:shard1 r:core_node1 x:relations_shard1_replica1] org.apache.solr.core.QuerySenderListener; QuerySenderListener done. INFO - 2015-09-21 15:57:49.937; [c:relations s:shard1 r:core_node1 x:relations_shard1_replica1] org.apache.solr.core.SolrCore; [relations_shard1_replica1] Registered new searcher Searcher@6e3d2e82[relations_shard1_replica1] main{ExitableDirectoryReader(UninvertingDirectoryReader(Uninverting(_kh(5.3.0):C4186)))} INFO - 2015-09-21 15:57:49.942; [c:relations s:shard1 r:core_node1 x:relations_shard1_replica1] org.apache.solr.update.DirectUpdateHandler2; end_commit_flush INFO - 2015-09-21 15:57:49.952; [c:relations s:shard1 r:core_node1 x:relations_shard1_replica1] org.apache.solr.update.processor.LogUpdateProcessor; [relations_shard1_replica1] webapp=/solr path=/update params={waitSearcher= true &commit= true &softCommit= false &wt=javabin&version=2} {commit=} 0 657
          Hide
          Noble Paul added a comment -

          Markus Jelsma does the same request throw the same error consistently ?

          Show
          Noble Paul added a comment - Markus Jelsma does the same request throw the same error consistently ?
          Hide
          Markus Jelsma added a comment -

          Hello, only when i send a deleteByQuery via SolrJ. I can do a simple commit via curl without problems. The deleteAll usually fails, but not always. The error is thrown consistently since last week.

          Show
          Markus Jelsma added a comment - Hello, only when i send a deleteByQuery via SolrJ. I can do a simple commit via curl without problems. The deleteAll usually fails, but not always. The error is thrown consistently since last week.
          Hide
          Noble Paul added a comment -

          Anshum Gupta If your theory was right, it could have happened only once or twice during the startup. If he saw the errors again and again , it has to be something else

          Show
          Noble Paul added a comment - Anshum Gupta If your theory was right, it could have happened only once or twice during the startup. If he saw the errors again and again , it has to be something else
          Hide
          Anshum Gupta added a comment -

          Noble Paul correct, that was why I didn't commit this.

          Markus Jelsma thanks for sharing the logs. I just wrote a sample test against trunk that creates a collection called "relations" (wanted this to be as close as your setup, from the logs) and #shards as 2, I add a document, commit it and then send a delete by query. I use the CloudSolrClient to do all of this, but haven't hit this in multiple runs.

          Show
          Anshum Gupta added a comment - Noble Paul correct, that was why I didn't commit this. Markus Jelsma thanks for sharing the logs. I just wrote a sample test against trunk that creates a collection called "relations" (wanted this to be as close as your setup, from the logs) and #shards as 2, I add a document, commit it and then send a delete by query. I use the CloudSolrClient to do all of this, but haven't hit this in multiple runs.
          Hide
          Noble Paul added a comment -

          Markus Jelsma can you post your solr-core-x.y.x.jar in that node here. I can cross verify it locally

          Show
          Noble Paul added a comment - Markus Jelsma can you post your solr-core-x.y.x.jar in that node here. I can cross verify it locally
          Hide
          Markus Jelsma added a comment -

          This is our solr-core.jar, taken directly from one of the affected locations. It has SOLR-4137, SOLR-5149 and SOLR-7968 patched over the original 5.3.0 sources but the Solr's having the role of the affected machines do not use it.

          Show
          Markus Jelsma added a comment - This is our solr-core.jar, taken directly from one of the affected locations. It has SOLR-4137 , SOLR-5149 and SOLR-7968 patched over the original 5.3.0 sources but the Solr's having the role of the affected machines do not use it.
          Hide
          Noble Paul added a comment - - edited

          As it is a patched version of Solr, it is hard to verify the lines.

          Is it possible to test with the upcoming version of Solr https://dist.apache.org/repos/dist/dev/lucene/lucene-solr-5.3.1-RC3-rev1703449/solr/

          comparing with the decompiled class it looks like you have authentication enabled. Can you also share the security.json from your ZK ?(please mask the irrelevant pieces. I just need to know if it is enabled or not)

          Show
          Noble Paul added a comment - - edited As it is a patched version of Solr, it is hard to verify the lines. Is it possible to test with the upcoming version of Solr https://dist.apache.org/repos/dist/dev/lucene/lucene-solr-5.3.1-RC3-rev1703449/solr/ comparing with the decompiled class it looks like you have authentication enabled. Can you also share the security.json from your ZK ?(please mask the irrelevant pieces. I just need to know if it is enabled or not)
          Hide
          Markus Jelsma added a comment -

          Hello - We do not have security enabled, also the security.json is quite empty.

          version0
          aversion0
          children_count0
          ctimeThu Sep 10 14:48:15 UTC 2015 (1441896495665)
          cversion0
          czxid4294968678
          ephemeralOwner0
          mtimeThu Sep 10 14:48:15 UTC 2015 (1441896495665)
          mzxid4294968678
          pzxid4294968678
          dataLength2
          {}

          Show
          Markus Jelsma added a comment - Hello - We do not have security enabled, also the security.json is quite empty. version0 aversion0 children_count0 ctimeThu Sep 10 14:48:15 UTC 2015 (1441896495665) cversion0 czxid4294968678 ephemeralOwner0 mtimeThu Sep 10 14:48:15 UTC 2015 (1441896495665) mzxid4294968678 pzxid4294968678 dataLength2 {}
          Hide
          Noble Paul added a comment -

          Markus Jelsma it is not useful to provide a stacktrace for a snapshot build. We can't really figure out what the issue is unless we have the exact source code with us.

          Show
          Noble Paul added a comment - Markus Jelsma it is not useful to provide a stacktrace for a snapshot build. We can't really figure out what the issue is unless we have the exact source code with us.
          Hide
          Markus Jelsma added a comment -

          I know, but it is vanilla 5.3.0 with and three patches are just concerned with facetting, highlighting and the querycomponent, i don't see how they can cause this. By the way, i cannot reproduce it anymore, not with the patched 5.3.0 or the vanilla 5.3.0, which i have tried just now.

          Show
          Markus Jelsma added a comment - I know, but it is vanilla 5.3.0 with and three patches are just concerned with facetting, highlighting and the querycomponent, i don't see how they can cause this. By the way, i cannot reproduce it anymore, not with the patched 5.3.0 or the vanilla 5.3.0, which i have tried just now.
          Hide
          Noble Paul added a comment -

          I decompiled the class files from your jar and it does not really match with the 5.2.1 or 5.3.0 release line numbers. That's why I requested you if it is possible to reproduce it with one of the releases or an upcoming release.
          Thanks for your patience

          Show
          Noble Paul added a comment - I decompiled the class files from your jar and it does not really match with the 5.2.1 or 5.3.0 release line numbers. That's why I requested you if it is possible to reproduce it with one of the releases or an upcoming release. Thanks for your patience
          Hide
          Anshum Gupta added a comment -

          Let's leave this issue as is right now and not commit anything here specially considering the fact that we have figured out why it happened. We haven't even really other users report this.

          I'll leave this open for us to come back, in case this happens again but for now, let's move on with the 5.3.1 release.

          What do you guys think?

          Show
          Anshum Gupta added a comment - Let's leave this issue as is right now and not commit anything here specially considering the fact that we have figured out why it happened. We haven't even really other users report this. I'll leave this open for us to come back, in case this happens again but for now, let's move on with the 5.3.1 release. What do you guys think?
          Hide
          Alexandre Rafalovitch added a comment -

          I marked the other linked issue as a blocker (for windows), so perhaps solving that will also solve this.

          The latest hint on the other issue was that something in the init sequence treats relative path differently from the absolute path, though - unhelpfully - all the log messages are identical.

          Show
          Alexandre Rafalovitch added a comment - I marked the other linked issue as a blocker (for windows), so perhaps solving that will also solve this. The latest hint on the other issue was that something in the init sequence treats relative path differently from the absolute path, though - unhelpfully - all the log messages are identical.
          Hide
          Ishan Chattopadhyaya added a comment -

          Anshum Gupta Please feel free to disregard, but my preference would be to commit either of the patches to 5.3.1, since it is a good to have safety check anyway.

          Show
          Ishan Chattopadhyaya added a comment - Anshum Gupta Please feel free to disregard, but my preference would be to commit either of the patches to 5.3.1, since it is a good to have safety check anyway.
          Hide
          Anshum Gupta added a comment -

          Ishan Chattopadhyaya we shouldn't hit this situation at all and if cores is null, at some point, the request should fail fast, before it even gets to that line. Adding that line there only transforms the Exception from NPE to SolrException for the user, which is a better experience but not correct.
          Unless we know when/why would this happen, I'm against masking the error.

          Show
          Anshum Gupta added a comment - Ishan Chattopadhyaya we shouldn't hit this situation at all and if cores is null, at some point, the request should fail fast, before it even gets to that line. Adding that line there only transforms the Exception from NPE to SolrException for the user, which is a better experience but not correct. Unless we know when/why would this happen, I'm against masking the error.
          Hide
          Ishan Chattopadhyaya added a comment -

          Anshum Gupta this situation can be hit any time the core container didn't initialize properly. The NPE on authentication in such a situation looks confusing at that time, and hence I think we should have either my patch or your patch to go in. Such a situation was observed in SOLR-8073.

          Show
          Ishan Chattopadhyaya added a comment - Anshum Gupta this situation can be hit any time the core container didn't initialize properly. The NPE on authentication in such a situation looks confusing at that time, and hence I think we should have either my patch or your patch to go in. Such a situation was observed in SOLR-8073 .
          Hide
          Noble Paul added a comment -

          Anshum Gupta I think that if the corecontainer is not initialized properly we should throw a sensible error in the beginning itself. Otherwise it will manifest in some kind of NPE or other errors at a different line, which may not be consistent at all

          Show
          Noble Paul added a comment - Anshum Gupta I think that if the corecontainer is not initialized properly we should throw a sensible error in the beginning itself. Otherwise it will manifest in some kind of NPE or other errors at a different line, which may not be consistent at all
          Hide
          Anshum Gupta added a comment -

          I agree that if the core container didn't initialize properly, we should fail fast and I actually thought that happened. Seems like I was wrong on that. I'll update the patch here to fail sooner than in the authentication wrapper.

          Show
          Anshum Gupta added a comment - I agree that if the core container didn't initialize properly, we should fail fast and I actually thought that happened. Seems like I was wrong on that. I'll update the patch here to fail sooner than in the authentication wrapper.
          Hide
          Ishan Chattopadhyaya added a comment -

          Anshum Gupta Can you please review my last patch here? I think it is doing the right thing, unless I've missed something more.

          Show
          Ishan Chattopadhyaya added a comment - Anshum Gupta Can you please review my last patch here? I think it is doing the right thing, unless I've missed something more.
          Hide
          Anshum Gupta added a comment -

          Ah sorry, I actually thought you'd uploaded the exact same patch as mine (they were posted just too close). I don't think we need the check in HttpSolrCall but should just fail sooner.

          I'll commit the check in SDF.

          Show
          Anshum Gupta added a comment - Ah sorry, I actually thought you'd uploaded the exact same patch as mine (they were posted just too close). I don't think we need the check in HttpSolrCall but should just fail sooner. I'll commit the check in SDF.
          Hide
          ASF subversion and git services added a comment -

          Commit 1704935 from Anshum Gupta in branch 'dev/trunk'
          [ https://svn.apache.org/r1704935 ]

          SOLR-8068: Check and throw exception in the SDF early if the core container wasn't initialized properly or is shutting down.

          Show
          ASF subversion and git services added a comment - Commit 1704935 from Anshum Gupta in branch 'dev/trunk' [ https://svn.apache.org/r1704935 ] SOLR-8068 : Check and throw exception in the SDF early if the core container wasn't initialized properly or is shutting down.
          Hide
          ASF subversion and git services added a comment -

          Commit 1704948 from Anshum Gupta in branch 'dev/branches/branch_5x'
          [ https://svn.apache.org/r1704948 ]

          SOLR-8068: Check and throw exception in the SDF early if the core container wasn't initialized properly or is shutting down.(merge from trunk)

          Show
          ASF subversion and git services added a comment - Commit 1704948 from Anshum Gupta in branch 'dev/branches/branch_5x' [ https://svn.apache.org/r1704948 ] SOLR-8068 : Check and throw exception in the SDF early if the core container wasn't initialized properly or is shutting down.(merge from trunk)
          Hide
          Anshum Gupta added a comment -

          Thanks everyone. I think this is sorted now.

          Show
          Anshum Gupta added a comment - Thanks everyone. I think this is sorted now.
          Hide
          Markus Jelsma added a comment -

          Hi Anshum Gupta, is this issue considered fixed?

          Show
          Markus Jelsma added a comment - Hi Anshum Gupta , is this issue considered fixed?
          Hide
          Anshum Gupta added a comment -

          Hi Markus Jelsma, I think so. I never hit this issue but the commit here should handle the NPE you saw. Is there any way you could test this out and report back if you see this (or not) ?

          Show
          Anshum Gupta added a comment - Hi Markus Jelsma , I think so. I never hit this issue but the commit here should handle the NPE you saw. Is there any way you could test this out and report back if you see this (or not) ?
          Hide
          Markus Jelsma added a comment -

          No i can't. I could reproduce it for a short while but the issue vanished just five days after i opened this one. Let's mark this one as fixed for 5.4 and close it. We can always open another one if it reappears later.

          Show
          Markus Jelsma added a comment - No i can't. I could reproduce it for a short while but the issue vanished just five days after i opened this one. Let's mark this one as fixed for 5.4 and close it. We can always open another one if it reappears later.
          Hide
          Anshum Gupta added a comment -

          Sure, though I don't see why this should happen again.

          Show
          Anshum Gupta added a comment - Sure, though I don't see why this should happen again.

            People

            • Assignee:
              Anshum Gupta
              Reporter:
              Markus Jelsma
            • Votes:
              0 Vote for this issue
              Watchers:
              7 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development