Solr
  1. Solr
  2. SOLR-5322

core discovery can fail w/NPE and no explanation if a non-readable directory exists

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: 4.5
    • Fix Version/s: 5.0, 6.0
    • Component/s: None
    • Labels:
      None
    • Environment:

      Centos 6.4
      tomcat6

      Description

      Core discovery doesn't behave well if it encounters a directory it can't read. We should either make core discover log & ignore directories such as this, or improve the error message in such a situation if we think it should be fatale.

      steps to reproduce...

      hossman@frisbee:~/lucene/4x_dev/solr/example$ mkdir solr/NO_READ
      hossman@frisbee:~/lucene/4x_dev/solr/example$ chmod a-r solr/NO_READ/
      hossman@frisbee:~/lucene/4x_dev/solr/example$ java -jar start.jar 
      ...
      1434 [main] ERROR org.apache.solr.servlet.SolrDispatchFilter  – Could not start Solr. Check solr/home property and the logs
      1452 [main] ERROR org.apache.solr.core.SolrCore  – null:java.lang.NullPointerException
      	at org.apache.solr.core.CorePropertiesLocator.discoverUnder(CorePropertiesLocator.java:131)
      	at org.apache.solr.core.CorePropertiesLocator.discoverUnder(CorePropertiesLocator.java:140)
      	at org.apache.solr.core.CorePropertiesLocator.discover(CorePropertiesLocator.java:123)
      	at org.apache.solr.core.CoreContainer.load(CoreContainer.java:240)
      
      "original bug report"

      Hello.

      When in solr/home directory exists directory to which solr do not have rights, then solr failed to start with exception

      2108 [main] INFO org.apache.solr.core.CoresLocator - Looking for core definitions underneath /var/lib/solr
      2109 [main] ERROR org.apache.solr.servlet.SolrDispatchFilter - Could not start Solr. Check solr/home property and the logs
      2138 [main] ERROR org.apache.solr.core.SolrCore - null:java.lang.NullPointerException
              at org.apache.solr.core.CorePropertiesLocator.discoverUnder(CorePropertiesLocator.java:121)
              at org.apache.solr.core.CorePropertiesLocator.discoverUnder(CorePropertiesLocator.java:130)
              at org.apache.solr.core.CorePropertiesLocator.discover(CorePropertiesLocator.java:113)
              at org.apache.solr.core.CoreContainer.load(CoreContainer.java:226)
              at org.apache.solr.servlet.SolrDispatchFilter.createCoreContainer(SolrDispatchFilter.java:177)
              at org.apache.solr.servlet.SolrDispatchFilter.init(SolrDispatchFilter.java:127)
              at org.apache.catalina.core.ApplicationFilterConfig.getFilter(ApplicationFilterConfig.java:295)
              at org.apache.catalina.core.ApplicationFilterConfig.setFilterDef(ApplicationFilterConfig.java:422)
              at org.apache.catalina.core.ApplicationFilterConfig.<init>(ApplicationFilterConfig.java:115)
              at org.apache.catalina.core.StandardContext.filterStart(StandardContext.java:3838)
              at org.apache.catalina.core.StandardContext.start(StandardContext.java:4488)
              at org.apache.catalina.core.ContainerBase.addChildInternal(ContainerBase.java:791)
              at org.apache.catalina.core.ContainerBase.addChild(ContainerBase.java:771)
              at org.apache.catalina.core.StandardHost.addChild(StandardHost.java:526)
              at org.apache.catalina.startup.HostConfig.deployDescriptor(HostConfig.java:637)
              at org.apache.catalina.startup.HostConfig.deployDescriptors(HostConfig.java:563)
              at org.apache.catalina.startup.HostConfig.deployApps(HostConfig.java:498)
              at org.apache.catalina.startup.HostConfig.start(HostConfig.java:1277)
              at org.apache.catalina.startup.HostConfig.lifecycleEvent(HostConfig.java:321)
              at org.apache.catalina.util.LifecycleSupport.fireLifecycleEvent(LifecycleSupport.java:119)
              at org.apache.catalina.core.ContainerBase.start(ContainerBase.java:1053)
              at org.apache.catalina.core.StandardHost.start(StandardHost.java:722)
              at org.apache.catalina.core.ContainerBase.start(ContainerBase.java:1045)
              at org.apache.catalina.core.StandardEngine.start(StandardEngine.java:443)
              at org.apache.catalina.core.StandardService.start(StandardService.java:516)
              at org.apache.catalina.core.StandardServer.start(StandardServer.java:710)
              at org.apache.catalina.startup.Catalina.start(Catalina.java:593)
              at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
              at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
              at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
              at java.lang.reflect.Method.invoke(Method.java:616)
              at org.apache.catalina.startup.Bootstrap.start(Bootstrap.java:289)
              at org.apache.catalina.startup.Bootstrap.main(Bootstrap.java:414)
      
      2138 [main] INFO org.apache.solr.servlet.SolrDispatchFilter - SolrDispatchFilter.init() done
      

      For example:
      solr home located on /var/lib/solr
      /var/lib/solr is another file system, it has lost+found directory.
      As result solr can't to star.

      Yours faithfully.

      1. SOLR-5322.patch
        8 kB
        Erick Erickson
      2. SOLR-5322.patch
        7 kB
        Erick Erickson
      3. SOLR-5322.patch
        6 kB
        Erick Erickson

        Activity

        Hide
        Erick Erickson added a comment -

        Please raise this kind of issue on the user's list before raising a JIRA to see if it's really a but in Solr or a configuration issue.

        You can reopen this is you think it's something Solr should manage.

        What would you have Solr do? If it's not being run as a process that has permissions to a necessary directory what can it do but fail on startup? You as the sysadmin are responsible for permissions....

        Show
        Erick Erickson added a comment - Please raise this kind of issue on the user's list before raising a JIRA to see if it's really a but in Solr or a configuration issue. You can reopen this is you think it's something Solr should manage. What would you have Solr do? If it's not being run as a process that has permissions to a necessary directory what can it do but fail on startup? You as the sysadmin are responsible for permissions....
        Hide
        Cott Lang added a comment -

        Erick Erickson - It's not the home directory permissions. If any subdirectory of the home directory isn't readable, Solr fails as indicated above.

        Here's an example:

        drwxr-xr-x 3 solr solr 4096 Aug 20 21:25 collection1
        drwx------ 2 root root 49152 Aug 21 22:02 lost+found
        rw-rr- 1 solr solr 1715 Jun 18 11:59 solr.xml

        Despite this otherwise being an otherwise healthy Solr home directory, Solr fails because it can't read lost+found.

        Show
        Cott Lang added a comment - Erick Erickson - It's not the home directory permissions. If any subdirectory of the home directory isn't readable, Solr fails as indicated above. Here's an example: drwxr-xr-x 3 solr solr 4096 Aug 20 21:25 collection1 drwx------ 2 root root 49152 Aug 21 22:02 lost+found rw-r r - 1 solr solr 1715 Jun 18 11:59 solr.xml Despite this otherwise being an otherwise healthy Solr home directory, Solr fails because it can't read lost+found.
        Hide
        Hoss Man added a comment -

        If any subdirectory of the home directory isn't readable, Solr fails as indicated above.

        Yeah ... that part really wasn't clear from the initial bug report.

        The issue here is that in core-discover mode, solr really can't tell if a non-readable directory is a sign of a problem or not. The flip side of ignoring non-readable directories is that it would mean solr might happily startup w/o some core you are expecting to be there if the permissions are set wrong.

        ...Despite this otherwise being an otherwise healthy Solr home directory, Solr fails because it can't read lost+found.

        The specific situation you show is a great example of why i would argue that the presense of a non-readable directory like lost+found is in fact a serious problem and you might not want thi solr node to startup because of it. what if lost+found contains your entire collection?


        regardless of wether we think solr should ignore non-readbale files, we should at least generate a better error message then NullPointerException

        Show
        Hoss Man added a comment - If any subdirectory of the home directory isn't readable, Solr fails as indicated above. Yeah ... that part really wasn't clear from the initial bug report. The issue here is that in core-discover mode, solr really can't tell if a non-readable directory is a sign of a problem or not. The flip side of ignoring non-readable directories is that it would mean solr might happily startup w/o some core you are expecting to be there if the permissions are set wrong. ...Despite this otherwise being an otherwise healthy Solr home directory, Solr fails because it can't read lost+found. The specific situation you show is a great example of why i would argue that the presense of a non-readable directory like lost+found is in fact a serious problem and you might not want thi solr node to startup because of it. what if lost+found contains your entire collection? regardless of wether we think solr should ignore non-readbale files, we should at least generate a better error message then NullPointerException
        Hide
        Erick Erickson added a comment - - edited

        Hmmm, for comparison, I just ran an experiment with the multicore setup where I changed core1/data/index to a-r permissions. Solr starts up, albeit with some warnings, which are way more informative. But core0 is still available. Here's the error:

        directory '/Users/Erick/apache/4x/solr/example/multicore/core1/data/index' exists and is a directory, but cannot be listed: list() returned null

        Which is much more informative than the core discovery bits. So it seems consistent to log an error and drive on.

        Show
        Erick Erickson added a comment - - edited Hmmm, for comparison, I just ran an experiment with the multicore setup where I changed core1/data/index to a-r permissions. Solr starts up, albeit with some warnings, which are way more informative. But core0 is still available. Here's the error: directory '/Users/Erick/apache/4x/solr/example/multicore/core1/data/index' exists and is a directory, but cannot be listed: list() returned null Which is much more informative than the core discovery bits. So it seems consistent to log an error and drive on.
        Hide
        Erick Erickson added a comment -

        As I thought about various file permissions, there are several things that could be tested/warned about. What is the right thing to do here?

        1> What if SolrHome isn't readable (but exists)? I'm testing code that throws a runtime error in this case. Is this too harsh? The code already dies a horrible death if the directory just doesn't exist.

        2> What about files as opposed to directories being unreadable? Should that print a warning too? The code does currently.

        Should have a patch up later today.

        Show
        Erick Erickson added a comment - As I thought about various file permissions, there are several things that could be tested/warned about. What is the right thing to do here? 1> What if SolrHome isn't readable (but exists)? I'm testing code that throws a runtime error in this case. Is this too harsh? The code already dies a horrible death if the directory just doesn't exist. 2> What about files as opposed to directories being unreadable? Should that print a warning too? The code does currently. Should have a patch up later today.
        Hide
        Erick Erickson added a comment -

        Oops, kinda forgot.

        This patch throws a runtime exception if SOLR_HOME is not readable, and logs a warning if any file or directory in the tree is unreadable and drives on.

        I'll commit this tomorrow unless there are objections.

        Show
        Erick Erickson added a comment - Oops, kinda forgot. This patch throws a runtime exception if SOLR_HOME is not readable, and logs a warning if any file or directory in the tree is unreadable and drives on. I'll commit this tomorrow unless there are objections.
        Hide
        Erick Erickson added a comment -

        Final patch, including CHANGES.txt. Will commit shortly.

        Show
        Erick Erickson added a comment - Final patch, including CHANGES.txt. Will commit shortly.
        Hide
        Erick Erickson added a comment -

        Siiigh. precommit caught a forbidden API (File.delete if you must know)....

        Show
        Erick Erickson added a comment - Siiigh. precommit caught a forbidden API (File.delete if you must know)....
        Hide
        ASF subversion and git services added a comment -

        Commit 1622745 from Erick Erickson in branch 'dev/trunk'
        [ https://svn.apache.org/r1622745 ]

        SOLR-5322: core discovery can fail w/NPE and no explanation if a non-readable directory exists

        Show
        ASF subversion and git services added a comment - Commit 1622745 from Erick Erickson in branch 'dev/trunk' [ https://svn.apache.org/r1622745 ] SOLR-5322 : core discovery can fail w/NPE and no explanation if a non-readable directory exists
        Hide
        ASF subversion and git services added a comment -

        Commit 1622756 from Erick Erickson in branch 'dev/branches/branch_4x'
        [ https://svn.apache.org/r1622756 ]

        SOLR-5322: core discovery can fail w/NPE and no explanation if a non-readable directory exists

        Show
        ASF subversion and git services added a comment - Commit 1622756 from Erick Erickson in branch 'dev/branches/branch_4x' [ https://svn.apache.org/r1622756 ] SOLR-5322 : core discovery can fail w/NPE and no explanation if a non-readable directory exists
        Hide
        Erick Erickson added a comment -

        Thanks Said!

        Show
        Erick Erickson added a comment - Thanks Said!
        Hide
        Anshum Gupta added a comment -

        Bulk close after 5.0 release.

        Show
        Anshum Gupta added a comment - Bulk close after 5.0 release.

          People

          • Assignee:
            Erick Erickson
            Reporter:
            Said Chavkin
          • Votes:
            0 Vote for this issue
            Watchers:
            6 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development