Solr
  1. Solr
  2. SOLR-5971

'Illegal character in query' when proxying request

    Details

      Description

      My cluster contains 3 Solr instances. I have a collection consisting of one shard with 2 replica's. So one node in the cluster does not have a replicate of the shard.

      The following query works when I query one of the two replica nodes:

      http://X.X.X.X:8080/solr/collection/select/?facet=true&facet.field=

      {!ex%3Dfilters,filter1340+key%3Dfacet1340Values}string_months_month&facet=true&q=:

      But when I query the node without the replica, I get;

      {msg=Illegal character in query at index 78: http://X.X.X.X:8080/solr/collection/select/?facet=true&facet.field={!ex%3Dfilters,filter1340+key%3Dfacet1340Values}

      string_months_month&facet=true&q=:,trace=java.lang.IllegalArgumentException
      at java.net.URI.create(URI.java:842)
      at org.apache.http.client.methods.HttpGet.<init>(HttpGet.java:69)
      at org.apache.solr.servlet.SolrDispatchFilter.remoteQuery(SolrDispatchFilter.java:527)
      at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:340)
      at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:217)
      at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1419)
      at org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:455)
      at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:137)
      at org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:557)
      at org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:231)
      at org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1075)
      at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:384)
      at org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:193)
      at org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1009)
      at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:135)
      at org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:255)
      at org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:154)
      at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:116)
      at org.eclipse.jetty.server.Server.handle(Server.java:368)
      at org.eclipse.jetty.server.AbstractHttpConnection.handleRequest(AbstractHttpConnection.java:489)
      at org.eclipse.jetty.server.BlockingHttpConnection.handleRequest(BlockingHttpConnection.java:53)
      at org.eclipse.jetty.server.AbstractHttpConnection.headerComplete(AbstractHttpConnection.java:942)
      at org.eclipse.jetty.server.AbstractHttpConnection$RequestHandler.headerComplete(AbstractHttpConnection.java:1004)
      at org.eclipse.jetty.http.HttpParser.parseNext(HttpParser.java:640)
      at org.eclipse.jetty.http.HttpParser.parseAvailable(HttpParser.java:235)
      at org.eclipse.jetty.server.BlockingHttpConnection.handle(BlockingHttpConnection.java:72)
      at org.eclipse.jetty.server.bio.SocketConnector$ConnectorEndPoint.run(SocketConnector.java:264)
      at org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:608)
      at org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:543)
      at java.lang.Thread.run(Thread.java:662)

      Without the facet.field attribute, it works fine on all the nodes.
      Is this some kind of double escaping when proxying the request?

      1. SOLR-5971.patch
        17 kB
        Uwe Schindler
      2. SOLR-5971.patch
        7 kB
        Uwe Schindler
      3. SOLR-5971.patch
        6 kB
        Ishan Chattopadhyaya
      4. SOLR-5971.patch
        7 kB
        Ishan Chattopadhyaya
      5. SOLR-5971.patch
        6 kB
        Ishan Chattopadhyaya
      6. SOLR-5971.patch
        6 kB
        Ishan Chattopadhyaya

        Activity

        Hide
        Shawn Heisey added a comment - - edited

        I think "index 78" may correspond to the right curly brace character for the localparams on your facet.field. What happens if you replace { with %7B and } with %7D in your URL? Right now, I consider this a troubleshooting step, not necessarily a workaround.

        Show
        Shawn Heisey added a comment - - edited I think "index 78" may correspond to the right curly brace character for the localparams on your facet.field. What happens if you replace { with %7B and } with %7D in your URL? Right now, I consider this a troubleshooting step, not necessarily a workaround.
        Hide
        Eric Bus added a comment -

        Unfortunately, that does not change the error. After encoding the braces, the same error is reported on the node without the replica. The results on the other nodes is the same.

        Show
        Eric Bus added a comment - Unfortunately, that does not change the error. After encoding the braces, the same error is reported on the node without the replica. The results on the other nodes is the same.
        Hide
        Yonik Seeley added a comment -

        Another user reports what looks to be the same bug:
        http://markmail.org/message/v4vkkd2tqwq4uier

        Hi All,

        I am using SolrCloud 4.10.1 and I have 3 shards with replication factor of
        2 , i.e is 6 nodes altogether.

        When I query the server1 out of 6 nodes in the cluster with the below query
        , it works fine , but any other node in the cluster when queried with the
        same query results in a *HTTP Status 500 - {msg=Illegal character in query
        at index 181:*
        error.

        The character at index 181 is the boost character ^. I have see a Jira
        SOLR-5971 <https://issues.apache.org/jira/browse/SOLR-5971> for a similar
        issue , how can I overcome this issue.

        The query I use is below. Thanks in Advance!

        http://xxxxxx2.xxxxxxxx.com:8081/solr/dyCollection1_shard2_replica1/xxxxxxxx?q=xxxxx+xxxxx+xxxxxx&sort=score+desc&wt=json&indent=true&debugQuery=true&defType=edismax&qf=productName
        ^1.5+productDescription&mm=1&pf=productName+productDescription&ps=1&pf2=productName+productDescription&pf3=productName+productDescription&stopwords=true&lowercaseOperators=true

        Show
        Yonik Seeley added a comment - Another user reports what looks to be the same bug: http://markmail.org/message/v4vkkd2tqwq4uier Hi All, I am using SolrCloud 4.10.1 and I have 3 shards with replication factor of 2 , i.e is 6 nodes altogether. When I query the server1 out of 6 nodes in the cluster with the below query , it works fine , but any other node in the cluster when queried with the same query results in a *HTTP Status 500 - {msg=Illegal character in query at index 181:* error. The character at index 181 is the boost character ^. I have see a Jira SOLR-5971 < https://issues.apache.org/jira/browse/SOLR-5971 > for a similar issue , how can I overcome this issue. The query I use is below. Thanks in Advance! http://xxxxxx2.xxxxxxxx.com:8081/solr/dyCollection1_shard2_replica1/xxxxxxxx?q=xxxxx+xxxxx+xxxxxx&sort=score+desc&wt=json&indent=true&debugQuery=true&defType=edismax&qf=productName ^1.5+productDescription&mm=1&pf=productName+productDescription&ps=1&pf2=productName+productDescription&pf3=productName+productDescription&stopwords=true&lowercaseOperators=true
        Hide
        Garth Grimm added a comment - - edited

        Another customer with the same issue (running 5.3). When the query is initially directed to a node with a core from the collection things work correctly. When directed to a node without a core from the collection, an error is throwing showing the URL parsing error from one of the nodes WITH the proper core.

        So a query like this would work (node has the core on it):
        http://machine1:8983/solr/someCollection/select?q=id:something^2

        But this query (node doesn't have the core on it):
        http://machine3:8983/solr/someCollection/select?q=id:something^2

        Will yield an error message like:
        Illegal character in query at index XX: http://machine1:8983/solr/someCollection/select?q=id:something^2

        So it appears that the proxy code on machine3 is mangling the URL that is being passed to machine1?

        Also, by URL-encoding the ^ character in the query, the issue can be avoided. So this query would work fine against any node:
        http://machine3:8983/solr/someCollection/select?q=id:something%5E2

        Show
        Garth Grimm added a comment - - edited Another customer with the same issue (running 5.3). When the query is initially directed to a node with a core from the collection things work correctly. When directed to a node without a core from the collection, an error is throwing showing the URL parsing error from one of the nodes WITH the proper core. So a query like this would work (node has the core on it): http://machine1:8983/solr/someCollection/select?q=id:something ^2 But this query (node doesn't have the core on it): http://machine3:8983/solr/someCollection/select?q=id:something ^2 Will yield an error message like: Illegal character in query at index XX: http://machine1:8983/solr/someCollection/select?q=id:something ^2 So it appears that the proxy code on machine3 is mangling the URL that is being passed to machine1? Also, by URL-encoding the ^ character in the query, the issue can be avoided. So this query would work fine against any node: http://machine3:8983/solr/someCollection/select?q=id:something%5E2
        Hide
        Yonik Seeley added a comment -

        Looks like this slipped off everybody's radar.
        Since it seems serious, I'll mark it as a blocker for the next release.

        Show
        Yonik Seeley added a comment - Looks like this slipped off everybody's radar. Since it seems serious, I'll mark it as a blocker for the next release.
        Hide
        Suraj Phanindar Reddy added a comment -

        This is critical for proper functioning of Solr Cloud. Thank you Garth for updating this case.

        Show
        Suraj Phanindar Reddy added a comment - This is critical for proper functioning of Solr Cloud. Thank you Garth for updating this case.
        Hide
        Ishan Chattopadhyaya added a comment -

        I've been able to reproduce this locally (starting a two node cluster manually, not through a test yet). I'm working on a fix for this and shall post a patch for this, unless someone else beats me to it.

        Show
        Ishan Chattopadhyaya added a comment - I've been able to reproduce this locally (starting a two node cluster manually, not through a test yet). I'm working on a fix for this and shall post a patch for this, unless someone else beats me to it.
        Hide
        Ishan Chattopadhyaya added a comment - - edited

        While performing a remote query at HttpSolrCall, the original query request string is passed along as is to the httpclient to make the forward query. The problem is (maybe a newly introduced one, due to some HttpClient regression?) that httpclient throws exception for urls that contain certain special characters like ^ or { or }.

        Added a patch which recreates another query string by URL encoding every query parameter.

        The reason why we missed this in our testing is that all our tests use the SolrQuery and a SolrClient to make the queries, and that uses url encoding internally. For testing this, I couldn't use an httpclient to query with such special characters in the url, since httpclient doesn't allow this in the first place. Resorted to using a java.net.URL.openStream() with the url containing weird characters to test this; added a new test suite altogether as I couldn't find out an appropriate test suite that exists right now.

        Can someone please review the patch? Thanks.

        Show
        Ishan Chattopadhyaya added a comment - - edited While performing a remote query at HttpSolrCall, the original query request string is passed along as is to the httpclient to make the forward query. The problem is (maybe a newly introduced one, due to some HttpClient regression?) that httpclient throws exception for urls that contain certain special characters like ^ or { or }. Added a patch which recreates another query string by URL encoding every query parameter. The reason why we missed this in our testing is that all our tests use the SolrQuery and a SolrClient to make the queries, and that uses url encoding internally. For testing this, I couldn't use an httpclient to query with such special characters in the url, since httpclient doesn't allow this in the first place. Resorted to using a java.net.URL.openStream() with the url containing weird characters to test this; added a new test suite altogether as I couldn't find out an appropriate test suite that exists right now. Can someone please review the patch? Thanks.
        Hide
        Uwe Schindler added a comment -

        You must also pass the key name through the encoder! Its unlikely that Solr contains key names which violate the spec, but better safe than fail again in the future. forms encoding requires that both key and value is encoded. Also use the correct constants for UTF-8.

        Show
        Uwe Schindler added a comment - You must also pass the key name through the encoder! Its unlikely that Solr contains key names which violate the spec, but better safe than fail again in the future. forms encoding requires that both key and value is encoded. Also use the correct constants for UTF-8.
        Hide
        Ishan Chattopadhyaya added a comment -

        Thanks Uwe for looking at it. Encoded the key as well to be safe.
        Running the full suite of tests now (it passed for me last time, apart from the ones that fail frequently at Jenkins anyway).

        Show
        Ishan Chattopadhyaya added a comment - Thanks Uwe for looking at it. Encoded the key as well to be safe. Running the full suite of tests now (it passed for me last time, apart from the ones that fail frequently at Jenkins anyway).
        Hide
        Ishan Chattopadhyaya added a comment -

        Also use the correct constants for UTF-8.

        Updated the patch to use StandardCharsets.UTF_8.name(). There are several other instances of the use of "UTF-8" constant.

        Btw, should we change all of those to StandardCharsets.UTF_8.name() in a separate JIRA issue? I've included one such change in this patch, i.e. ClientUtils. That method in the ClientUtils is very similar to what we're doing here in this patch.

        Show
        Ishan Chattopadhyaya added a comment - Also use the correct constants for UTF-8. Updated the patch to use StandardCharsets.UTF_8.name() . There are several other instances of the use of "UTF-8" constant. Btw, should we change all of those to StandardCharsets.UTF_8.name() in a separate JIRA issue? I've included one such change in this patch, i.e. ClientUtils. That method in the ClientUtils is very similar to what we're doing here in this patch.
        Hide
        Ishan Chattopadhyaya added a comment -

        Shall we mark this as a blocker for 5.4 instead of 5.5?

        Show
        Ishan Chattopadhyaya added a comment - Shall we mark this as a blocker for 5.4 instead of 5.5?
        Hide
        Ishan Chattopadhyaya added a comment -

        All tests pass for me after applying this patch (some fail intermittently, and seem like unrelated and non-reproducible failures. All the usual culprits from Jenkins over past few days.).

        Show
        Ishan Chattopadhyaya added a comment - All tests pass for me after applying this patch (some fail intermittently, and seem like unrelated and non-reproducible failures. All the usual culprits from Jenkins over past few days.).
        Hide
        Uwe Schindler added a comment -

        One thing: Maybe we can use a HttpClient method to build the query string? How do we do that in SolrJ generally?

        Show
        Uwe Schindler added a comment - One thing: Maybe we can use a HttpClient method to build the query string? How do we do that in SolrJ generally?
        Hide
        Ishan Chattopadhyaya added a comment -

        We use the ClientUtils.toQueryString().

          public static String toQueryString( SolrParams params, boolean xml ) {
            StringBuilder sb = new StringBuilder(128);
            try {
              String amp = xml ? "&amp;" : "&";
              boolean first=true;
              Iterator<String> names = params.getParameterNamesIterator();
              while( names.hasNext() ) {
                String key = names.next();
                String[] valarr = params.getParams( key );
                if( valarr == null ) {
                  sb.append( first?"?":amp );
                  sb.append(key);
                  first=false;
                }
                else {
                  for (String val : valarr) {
                    sb.append( first? "?":amp );
                    sb.append(key);
                    if( val != null ) {
                      sb.append('=');
                      sb.append( URLEncoder.encode( val, StandardCharsets.UTF_8.name() ) );
                    }
                    first=false;
                  }
                }
              }
            }
            catch (IOException e) {throw new RuntimeException(e);}  // can't happen
            return sb.toString();
          }
        

        Do you know how to use the HttpClient to build the query string?

        Show
        Ishan Chattopadhyaya added a comment - We use the ClientUtils.toQueryString() . public static String toQueryString( SolrParams params, boolean xml ) { StringBuilder sb = new StringBuilder(128); try { String amp = xml ? "&amp;" : "&"; boolean first=true; Iterator<String> names = params.getParameterNamesIterator(); while( names.hasNext() ) { String key = names.next(); String[] valarr = params.getParams( key ); if( valarr == null ) { sb.append( first?"?":amp ); sb.append(key); first=false; } else { for (String val : valarr) { sb.append( first? "?":amp ); sb.append(key); if( val != null ) { sb.append('='); sb.append( URLEncoder.encode( val, StandardCharsets.UTF_8.name() ) ); } first=false; } } } } catch (IOException e) {throw new RuntimeException(e);} // can't happen return sb.toString(); } Do you know how to use the HttpClient to build the query string?
        Hide
        Uwe Schindler added a comment -

        OK. So your patch is fine. I am just confused why you put the utility method into RequestUtils, which is part of the json package. This seems "wrong" (the json package). Otherwise looks fine.

        I general I don't think this is a bug in SOLR, it is just "wrong" to accept the incoming URL anyways, so Jetty should have refused it already But that is a different discussion! We just workaround on broken users violating the UR spec. So I am fine with that.

        Show
        Uwe Schindler added a comment - OK. So your patch is fine. I am just confused why you put the utility method into RequestUtils, which is part of the json package. This seems "wrong" (the json package). Otherwise looks fine. I general I don't think this is a bug in SOLR, it is just "wrong" to accept the incoming URL anyways, so Jetty should have refused it already But that is a different discussion! We just workaround on broken users violating the UR spec. So I am fine with that.
        Hide
        Ishan Chattopadhyaya added a comment - - edited

        I couldn't think of the right place to put this. "RequestUtil" sounded the least wrong place to put this in (but the json part seemed bad). I was also thinking of ClientUtils (which is in a solrj package), because there is a very functionally similar method there by the same name already. Maybe I should've kept it in HttpSolrCall itself.
        I'll raise a patch to put this in HttpSolrCall. Does it sound fine?

        Show
        Ishan Chattopadhyaya added a comment - - edited I couldn't think of the right place to put this. "RequestUtil" sounded the least wrong place to put this in (but the json part seemed bad). I was also thinking of ClientUtils (which is in a solrj package), because there is a very functionally similar method there by the same name already. Maybe I should've kept it in HttpSolrCall itself. I'll raise a patch to put this in HttpSolrCall. Does it sound fine?
        Hide
        Ishan Chattopadhyaya added a comment -

        Here goes the patch with the method in HttpSolrCall itself. Uwe, if you can think of a better place to put it in than this, please feel free to move it during the commit. Thanks.

        Show
        Ishan Chattopadhyaya added a comment - Here goes the patch with the method in HttpSolrCall itself. Uwe, if you can think of a better place to put it in than this, please feel free to move it during the commit. Thanks.
        Hide
        Uwe Schindler added a comment -

        Hi this patch does not pass the forbiddenapis check, because it uses HttpServletRequest.getParameterMap() & co. Calling these methods is forbidden in any solr code because it breaks with wrongly-configured servlet containers and is slow on Jetty. SolrRequestDispatcher's parameter parsing correctly parsers parameters into a SolrRequest. Why can't we use the SolrRequest?

        Show
        Uwe Schindler added a comment - Hi this patch does not pass the forbiddenapis check, because it uses HttpServletRequest.getParameterMap() & co. Calling these methods is forbidden in any solr code because it breaks with wrongly-configured servlet containers and is slow on Jetty. SolrRequestDispatcher's parameter parsing correctly parsers parameters into a SolrRequest. Why can't we use the SolrRequest?
        Hide
        Uwe Schindler added a comment -

        So it is better to use the already decoded parameters: this.queryParams and use those!
        Your code also has the problem that it cannot handle multiple identical keys (like multiple "fq=" parameters).

        I will post a new patch later so you can check it.

        Show
        Uwe Schindler added a comment - So it is better to use the already decoded parameters: this.queryParams and use those! Your code also has the problem that it cannot handle multiple identical keys (like multiple "fq=" parameters). I will post a new patch later so you can check it.
        Hide
        Ishan Chattopadhyaya added a comment -

        Ouch, I didn't check the precommit! The multiple parameters with same key not being handled was really bad; apologies...

        Show
        Ishan Chattopadhyaya added a comment - Ouch, I didn't check the precommit! The multiple parameters with same key not being handled was really bad; apologies...
        Hide
        Ishan Chattopadhyaya added a comment -

        Such mistakes, although would've been caught by (and eventually was) precommit/forbidden apis, perhaps also shows the need for us to bolster the unit tests around request forwarding a bit more.

        Show
        Ishan Chattopadhyaya added a comment - Such mistakes, although would've been caught by (and eventually was) precommit/forbidden apis, perhaps also shows the need for us to bolster the unit tests around request forwarding a bit more.
        Hide
        Uwe Schindler added a comment -

        Updated patch. I added a new method to the SolrParams class: toQueryString()

        This is most clean way. This allows to be used anywhere else, too (e.g. in SolrJ for building the query). This change is something for a new issue.

        Show
        Uwe Schindler added a comment - Updated patch. I added a new method to the SolrParams class: toQueryString() This is most clean way. This allows to be used anywhere else, too (e.g. in SolrJ for building the query). This change is something for a new issue.
        Hide
        Ishan Chattopadhyaya added a comment -

        +1, LGTM.

        Show
        Ishan Chattopadhyaya added a comment - +1, LGTM.
        Hide
        Uwe Schindler added a comment -

        New patch. I cleaned up lots of code duplication. Now SolrJ everywhere uses the SolrParams-provided method to encode query strings. So it is completely consistent.

        I also cleaned up the toString() used for logging query params (simplified URL encoding as documented).

        All core + solrj tests pass, rest is running now. If nobody objects I will commit this to trunk and 5.x branch, so it gets included in 5.4.

        Show
        Uwe Schindler added a comment - New patch. I cleaned up lots of code duplication. Now SolrJ everywhere uses the SolrParams-provided method to encode query strings. So it is completely consistent. I also cleaned up the toString() used for logging query params (simplified URL encoding as documented). All core + solrj tests pass, rest is running now. If nobody objects I will commit this to trunk and 5.x branch, so it gets included in 5.4.
        Hide
        ASF subversion and git services added a comment -

        Commit 1715615 from Uwe Schindler in branch 'dev/trunk'
        [ https://svn.apache.org/r1715615 ]

        SOLR-5971: Fix error 'Illegal character in query' when proxying request

        Show
        ASF subversion and git services added a comment - Commit 1715615 from Uwe Schindler in branch 'dev/trunk' [ https://svn.apache.org/r1715615 ] SOLR-5971 : Fix error 'Illegal character in query' when proxying request
        Hide
        ASF subversion and git services added a comment -

        Commit 1715616 from Uwe Schindler in branch 'dev/branches/branch_5x'
        [ https://svn.apache.org/r1715616 ]

        Merged revision(s) 1715615 from lucene/dev/trunk:
        SOLR-5971: Fix error 'Illegal character in query' when proxying request

        Show
        ASF subversion and git services added a comment - Commit 1715616 from Uwe Schindler in branch 'dev/branches/branch_5x' [ https://svn.apache.org/r1715616 ] Merged revision(s) 1715615 from lucene/dev/trunk: SOLR-5971 : Fix error 'Illegal character in query' when proxying request

          People

          • Assignee:
            Uwe Schindler
            Reporter:
            Eric Bus
          • Votes:
            2 Vote for this issue
            Watchers:
            9 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development