Solr
  1. Solr
  2. SOLR-8058

Collections that start with css*, js*, img*, and tpl* can't be accessed as they match the exclusion filter

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Blocker Blocker
    • Resolution: Fixed
    • Affects Version/s: 5.2, 5.2.1, 5.3, 5.3.1
    • Fix Version/s: 5.3.1, 5.4
    • Component/s: None
    • Labels:
      None

      Description

      Collections that match css*, js*, img*, and tpl* can't be reached as the SDF short circuits paths that match those regular expressions.

      It should have only short circuited exact matches for those directories i.e.
      \css/,/js/,/img/,/tpl/ but that doesn't seem to be the case.

      Need to fix this regular expression so that the collection can be reached.

      1. SOLR-8058.patch
        5 kB
        Steve Rowe
      2. SOLR-8058.patch
        4 kB
        Anshum Gupta
      3. SOLR-8058.patch
        3 kB
        Anshum Gupta

        Activity

        Hide
        Upayavira added a comment - - edited

        I'm not quite sure what that code is trying to do. The Angular UI adds /partials to the list of directories, and it works just fine. Looking at the regexp entries, they are broken anyway:

            <init-param>
              <param-name>excludePatterns</param-name>
              <param-value>/css/*,/js/*,/img/*,/tpl/*</param-value>
            </init-param>
        

        That says any value that starts with /css, then is followed by a sequence of zero or more forward slashes. I think what is intended is /css/.*, no?

        Why do we need these excluded patterns?

        Show
        Upayavira added a comment - - edited I'm not quite sure what that code is trying to do. The Angular UI adds /partials to the list of directories, and it works just fine. Looking at the regexp entries, they are broken anyway: <init-param> <param-name>excludePatterns</param-name> <param-value>/css/*,/js/*,/img/*,/tpl/*</param-value> </init-param> That says any value that starts with /css, then is followed by a sequence of zero or more forward slashes. I think what is intended is /css/.* , no? Why do we need these excluded patterns?
        Hide
        Anshum Gupta added a comment -

        You are right, we need .+ or .* there.

        The reason why we added these was to bypass the SolrDispatchFilter logic and creation of the HttpSolrCall object for static files. This is also required so you don't have to setup authentication rules for static admin UI content.

        Show
        Anshum Gupta added a comment - You are right, we need .+ or .* there. The reason why we added these was to bypass the SolrDispatchFilter logic and creation of the HttpSolrCall object for static files. This is also required so you don't have to setup authentication rules for static admin UI content.
        Hide
        Upayavira added a comment -

        if we need to mark such files as "static" then perhaps we should serve them all from a specific directory - e.g. we could have refer to all but the index.html (or admin.html) files from /_admin/css/*.

        Or why not make the regexp .*\.css$, i.e. ends with .css. No queries should end with .css. I guess people could, however, add a .html request handler. Hmm.

        Show
        Upayavira added a comment - if we need to mark such files as "static" then perhaps we should serve them all from a specific directory - e.g. we could have refer to all but the index.html (or admin.html) files from /_admin/css/*. Or why not make the regexp .*\.css$ , i.e. ends with .css. No queries should end with .css. I guess people could, however, add a .html request handler. Hmm.
        Hide
        Anshum Gupta added a comment -

        I agree, and while working on this, I did think about moving things to a single static root, but that was reasonably invasive so decided against it at that point.
        About things ending in .css, it's better I think to just document the exact static directory names to be reserved keywords for Solr. I'm open to suggestions though.

        Show
        Anshum Gupta added a comment - I agree, and while working on this, I did think about moving things to a single static root, but that was reasonably invasive so decided against it at that point. About things ending in .css , it's better I think to just document the exact static directory names to be reserved keywords for Solr. I'm open to suggestions though.
        Hide
        Anshum Gupta added a comment -

        Here's the fix. Strangely, the test passes even without the patch, so I'm certainly missing something from the test framework somewhere.

        The test is fairly straight, it creates a collection called "css33", adds a document to it and tries to query and assert on numFound.

        Show
        Anshum Gupta added a comment - Here's the fix. Strangely, the test passes even without the patch, so I'm certainly missing something from the test framework somewhere. The test is fairly straight, it creates a collection called "css33", adds a document to it and tries to query and assert on numFound.
        Hide
        Anshum Gupta added a comment -

        Working on the changes so that JettyConfig for the tests include the exclusion filter.

        Show
        Anshum Gupta added a comment - Working on the changes so that JettyConfig for the tests include the exclusion filter.
        Hide
        Upayavira added a comment -

        How about this for an approach:

        Check against a regexp, such as ^.*\.css$. Then, if it matches, check the file system for a file. If a file exists, serve the file, otherwise let it through.

        Show
        Upayavira added a comment - How about this for an approach: Check against a regexp, such as ^.*\.css$ . Then, if it matches, check the file system for a file. If a file exists, serve the file, otherwise let it through.
        Hide
        Hoss Man added a comment -

        i think the patch prefix check is a safer bet: 1) it's simpler; 2) it doesn't risk false negatives if someone uses something like the ShowFileRequestHandler to serve a .html file from the config set.

        if we can consolidate all of the admin UI resources into a single root dir then great, but as long as they are confined to a fixed set of dirs it shouldn't matter (and moving them around now is just an added complexity / risk to try and get into a bug fix)

        Show
        Hoss Man added a comment - i think the patch prefix check is a safer bet: 1) it's simpler; 2) it doesn't risk false negatives if someone uses something like the ShowFileRequestHandler to serve a .html file from the config set. if we can consolidate all of the admin UI resources into a single root dir then great, but as long as they are confined to a fixed set of dirs it shouldn't matter (and moving them around now is just an added complexity / risk to try and get into a bug fix)
        Hide
        Upayavira added a comment -

        I'm absolutely behind moving all the UI into, say, _admin. I guess then, we can have /solr/ as the UI, and /solr/_admin/original.html as, in time, the old UI. The only URL not inside _admin is /solr/. For 5.3.1, index.html would also be outside of _admin.

        Show
        Upayavira added a comment - I'm absolutely behind moving all the UI into, say, _admin. I guess then, we can have /solr/ as the UI, and /solr/_admin/original.html as, in time, the old UI. The only URL not inside _admin is /solr/. For 5.3.1, index.html would also be outside of _admin.
        Hide
        Noble Paul added a comment - - edited

        we should move all the UI related stuff to another path. Consider that each incoming call has to apply all this regexp and that is unnecessary price to pay. +1 to move all these under _admin/ or _ui/

        Show
        Noble Paul added a comment - - edited we should move all the UI related stuff to another path. Consider that each incoming call has to apply all this regexp and that is unnecessary price to pay. +1 to move all these under _admin/ or _ui/
        Hide
        Upayavira added a comment -

        My only concern is that it is a rather large change for a point release. Can we do something small for 5.3.1, and fix it properly in 5.4?

        Show
        Upayavira added a comment - My only concern is that it is a rather large change for a point release. Can we do something small for 5.3.1, and fix it properly in 5.4?
        Hide
        Noble Paul added a comment -

        for this point release we can do a small fix. But having a list of regexp to check in every incoming request is a bad idea. We should fix it properly in 5.4

        Show
        Noble Paul added a comment - for this point release we can do a small fix. But having a list of regexp to check in every incoming request is a bad idea. We should fix it properly in 5.4
        Hide
        Noble Paul added a comment -

        please commit this ASAP
        I need to spin a build

        Show
        Noble Paul added a comment - please commit this ASAP I need to spin a build
        Hide
        Anshum Gupta added a comment -

        I'm at it. Just trying to get the JettySolrRunner to have this conf. for the test.

        Show
        Anshum Gupta added a comment - I'm at it. Just trying to get the JettySolrRunner to have this conf. for the test.
        Hide
        Anshum Gupta added a comment -

        Here's an updated patch but I still can't get this test to fail without the fix.

        I spoke with Steve Rowe and figured he mentioned that in SDF, request.getServletPath() always returns the empty string, maybe because it's not an HTTP request? ((Request)request).getPathInfo() seems to return the correct thing.

        but the problem there is that it doesn't seem to be working with non-embedded jetty.

        Any help/suggestions here would be good.

        Show
        Anshum Gupta added a comment - Here's an updated patch but I still can't get this test to fail without the fix. I spoke with Steve Rowe and figured he mentioned that in SDF, request.getServletPath() always returns the empty string, maybe because it's not an HTTP request? ((Request)request).getPathInfo() seems to return the correct thing. but the problem there is that it doesn't seem to be working with non-embedded jetty. Any help/suggestions here would be good.
        Hide
        Noble Paul added a comment -

        I have done manual test and this fix has solved the problem. W/o the fix it fails
        I dunno why the test setup is not working

        Show
        Noble Paul added a comment - I have done manual test and this fix has solved the problem. W/o the fix it fails I dunno why the test setup is not working
        Hide
        Steve Rowe added a comment - - edited

        Patch that fails without the exclude pattern fix. All Solr tests pass on trunk, and manual testing works, including the old and new admin UIs.

        The testing issue AFAICT is that HTTPServletRequest.getServletPath() returns depends on the pattern used:

        Returns the part of this request's URL that calls the servlet. This path starts with a "/" character and includes either the servlet name or a path to the servlet, but does not include any extra path information or a query string. Same as the value of the CGI variable SCRIPT_NAME.

        This method will return an empty string ("") if the servlet used to process this request was matched using the "/*" pattern.

        So in JettySolrRunner, the servlet path is returned as the empty string, but in manual testing, it's not.

        The attached patch matches exclude patterns against the concatenation of the servlet path and all following path segments.

        Anshum Gupta pointed me to http://stackoverflow.com/a/21046620, which explains the situation well, with an example.

        Show
        Steve Rowe added a comment - - edited Patch that fails without the exclude pattern fix. All Solr tests pass on trunk, and manual testing works, including the old and new admin UIs. The testing issue AFAICT is that HTTPServletRequest.getServletPath() returns depends on the pattern used: Returns the part of this request's URL that calls the servlet. This path starts with a "/" character and includes either the servlet name or a path to the servlet, but does not include any extra path information or a query string. Same as the value of the CGI variable SCRIPT_NAME. This method will return an empty string ("") if the servlet used to process this request was matched using the "/*" pattern. So in JettySolrRunner, the servlet path is returned as the empty string, but in manual testing, it's not. The attached patch matches exclude patterns against the concatenation of the servlet path and all following path segments. Anshum Gupta pointed me to http://stackoverflow.com/a/21046620 , which explains the situation well, with an example.
        Hide
        Noble Paul added a comment -

        Let's commit this and move on

        Show
        Noble Paul added a comment - Let's commit this and move on
        Hide
        Anshum Gupta added a comment -

        Thanks for the patch Steve Rowe and thanks Noble Paul for the patience.

        I just need 5 minutes to test this out and I'll commit it right after.

        P.S: Will check if Steve already ran precommit and test, else it'd take a bit longer than 5 min.

        Show
        Anshum Gupta added a comment - Thanks for the patch Steve Rowe and thanks Noble Paul for the patience. I just need 5 minutes to test this out and I'll commit it right after. P.S: Will check if Steve already ran precommit and test, else it'd take a bit longer than 5 min.
        Hide
        ASF subversion and git services added a comment -

        Commit 1703441 from Anshum Gupta in branch 'dev/trunk'
        [ https://svn.apache.org/r1703441 ]

        SOLR-8058: Fix the exclusion filter so that collections that start with js, css, img, tpl can be accessed.

        Show
        ASF subversion and git services added a comment - Commit 1703441 from Anshum Gupta in branch 'dev/trunk' [ https://svn.apache.org/r1703441 ] SOLR-8058 : Fix the exclusion filter so that collections that start with js, css, img, tpl can be accessed.
        Hide
        ASF subversion and git services added a comment -

        Commit 1703445 from Anshum Gupta in branch 'dev/branches/branch_5x'
        [ https://svn.apache.org/r1703445 ]

        SOLR-8058: Fix the exclusion filter so that collections that start with js, css, img, tpl can be accessed.(merge from trunk)

        Show
        ASF subversion and git services added a comment - Commit 1703445 from Anshum Gupta in branch 'dev/branches/branch_5x' [ https://svn.apache.org/r1703445 ] SOLR-8058 : Fix the exclusion filter so that collections that start with js, css, img, tpl can be accessed.(merge from trunk)
        Hide
        ASF subversion and git services added a comment -

        Commit 1703447 from Anshum Gupta in branch 'dev/branches/lucene_solr_5_3'
        [ https://svn.apache.org/r1703447 ]

        SOLR-8058: Fix the exclusion filter so that collections that start with js, css, img, tpl can be accessed.(merge from branch_5x)

        Show
        ASF subversion and git services added a comment - Commit 1703447 from Anshum Gupta in branch 'dev/branches/lucene_solr_5_3' [ https://svn.apache.org/r1703447 ] SOLR-8058 : Fix the exclusion filter so that collections that start with js, css, img, tpl can be accessed.(merge from branch_5x)
        Hide
        Anshum Gupta added a comment -

        Thanks everyone!

        Show
        Anshum Gupta added a comment - Thanks everyone!
        Hide
        ASF subversion and git services added a comment -

        Commit 1724263 from Adrien Grand in branch 'dev/branches/lucene_solr_5_4'
        [ https://svn.apache.org/r1724263 ]

        SOLR-8058, SOLR-8059: Fix CHANGES entries.

        Show
        ASF subversion and git services added a comment - Commit 1724263 from Adrien Grand in branch 'dev/branches/lucene_solr_5_4' [ https://svn.apache.org/r1724263 ] SOLR-8058 , SOLR-8059 : Fix CHANGES entries.
        Hide
        ASF subversion and git services added a comment -

        Commit d83223a564f0c4e2b09af22f8dcf139cc415ce5c in lucene-solr's branch refs/heads/branch_5_4 from Adrien Grand
        [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=d83223a ]

        SOLR-8058, SOLR-8059: Fix CHANGES entries.

        git-svn-id: https://svn.apache.org/repos/asf/lucene/dev/branches/lucene_solr_5_4@1724263 13f79535-47bb-0310-9956-ffa450edef68

        Show
        ASF subversion and git services added a comment - Commit d83223a564f0c4e2b09af22f8dcf139cc415ce5c in lucene-solr's branch refs/heads/branch_5_4 from Adrien Grand [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=d83223a ] SOLR-8058 , SOLR-8059 : Fix CHANGES entries. git-svn-id: https://svn.apache.org/repos/asf/lucene/dev/branches/lucene_solr_5_4@1724263 13f79535-47bb-0310-9956-ffa450edef68

          People

          • Assignee:
            Anshum Gupta
            Reporter:
            Anshum Gupta
          • Votes:
            0 Vote for this issue
            Watchers:
            6 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development