Uploaded image for project: 'Apache Jena'
  1. Apache Jena
  2. JENA-2257

QueryExecHTTP#actualSendMode ignores query length causing HTTP 414

Details

    • Bug
    • Status: Closed
    • Major
    • Resolution: Fixed
    • Jena 4.3.2
    • Jena 4.4.0
    • ARQ, SPARQL
    • None

    Description

      QueryExecHTTP#actualSendMode(), called in QueryExecHTTP#query(String reqAcceptHeader), does not consider the length of the query. In case of long queries, this causes 414 Request-URI Too Long errors, even if sendMode==QuerySendMode.asGetWithLimitBody (default):

      HttpException: 414 Request-URI Too Long
          at org.apache.jena.sparql.engine.http.QueryExceptionHTTP.rewrap(QueryExceptionHTTP.java:49)
          at org.apache.jena.sparql.exec.http.QueryExecHTTP.executeQuery(QueryExecHTTP.java:493)
          at org.apache.jena.sparql.exec.http.QueryExecHTTP.query(QueryExecHTTP.java:483)
          at org.apache.jena.sparql.exec.http.QueryExecHTTP.execRdfWorker(QueryExecHTTP.java:339)
          at org.apache.jena.sparql.exec.http.QueryExecHTTP.execGraph(QueryExecHTTP.java:287)
          at org.apache.jena.sparql.exec.http.QueryExecHTTP.construct(QueryExecHTTP.java:244)
          at org.apache.jena.sparql.exec.QueryExecutionAdapter.execConstruct(QueryExecutionAdapter.java:129)

      Workarounds:

      • QuerySendMode.systemDefault = QuerySendMode.asPost;
      • QueryExecution.service(…).sendMode(QuerySendMode.asPost)

      Attachments

        Issue Links

          Activity

            andy Andy Seaborne added a comment -

            Hi,

            Just to check some details:

            How long is the query string?

            Is the server the other side of any kind of reverse proxy? (it may have a shorter notion of "Request-URI Too Long").

            andy Andy Seaborne added a comment - Hi, Just to check some details: How long is the query string? Is the server the other side of any kind of reverse proxy? (it may have a shorter notion of "Request-URI Too Long").
            andy Andy Seaborne added a comment - - edited

            The code is:

                    int thisLengthLimit = urlLimit;
            . . .
                    // Only QuerySendMode.asGetWithLimitBody and QuerySendMode.asGetWithLimitForm here.
                    String requestURL = service;
                    // Don't add yet
                    //thisParams.addParam(HttpParams.pQuery, queryString);
                    String qs = params.httpString();
                    // ?query=
            
                    // URL Length, including service (for safety)
                    int length = service.length()+1+HttpParams.pQuery.length()+1+qs.length();
                    if ( length <= thisLengthLimit )
                        return QuerySendMode.asGetAlways;
                    return (sendMode==QuerySendMode.asGetWithLimitBody) ? QuerySendMode.asPost : QuerySendMode.asPostForm;
            

            which I think is testing the query length. Maybe the calculation is wrong.

            QueryExecHTTPBuilder allows the app to set the length.

            Intermediaries or the target server (what is it? by the way) may impose lower limits. The code could react to 414 by retrying as POST.

            As, in practice, the response to a HTTP GET with query string is not cached, using POST always might be a better default.


            Update: Found something : params.httpString() will not yet have the query string added.

            andy Andy Seaborne added a comment - - edited The code is: int thisLengthLimit = urlLimit; . . . // Only QuerySendMode.asGetWithLimitBody and QuerySendMode.asGetWithLimitForm here. String requestURL = service; // Don't add yet //thisParams.addParam(HttpParams.pQuery, queryString); String qs = params.httpString(); // ?query= // URL Length, including service ( for safety) int length = service.length()+1+HttpParams.pQuery.length()+1+qs.length(); if ( length <= thisLengthLimit ) return QuerySendMode.asGetAlways; return (sendMode==QuerySendMode.asGetWithLimitBody) ? QuerySendMode.asPost : QuerySendMode.asPostForm; which I think is testing the query length. Maybe the calculation is wrong. QueryExecHTTPBuilder allows the app to set the length. Intermediaries or the target server (what is it? by the way) may impose lower limits. The code could react to 414 by retrying as POST. As, in practice, the response to a HTTP GET with query string is not cached, using POST always might be a better default. Update: Found something : params.httpString() will not yet have the query string added.
            jmkeil Jan Martin Keil added a comment - - edited

            Yes. I inspected it with a debugger and found, that params is empty and therefore qs.length() == 0.

            Target server is https://query.wikidata.org/sparql.

            Maybe the query should get added to params in query(String) instead of actualSendMode(), executeQueryPostBody(Params, String), executeQueryPostForm(Params, String), executeQueryGet(Params, String)?

            Further, one could have all sendMode options in query(String) logic and just use an extra method for the URL length calculation instead of actualSendMode(). That way, query(String) becomes easier to understand.

            jmkeil Jan Martin Keil added a comment - - edited Yes. I inspected it with a debugger and found, that params is empty and therefore qs.length() == 0 . Target server is https://query.wikidata.org/sparql. Maybe the query should get added to params in query(String) instead of actualSendMode() , executeQueryPostBody(Params, String) , executeQueryPostForm(Params, String) , executeQueryGet(Params, String) ? Further, one could have all sendMode options in query(String) logic and just use an extra method for the URL length calculation instead of actualSendMode() . That way, query(String) becomes easier to understand.
            andy Andy Seaborne added a comment -

            Adding to params could have been done earlier (if not asPost) and then removed if there is a change of plan. The calculation of length can done without actually making the query string.

            I'm not sure what the general practice is for triple stores and SPARQL libraries - I would not be surprised if it is common to just have POST+body and just be done with it. Use with HTML forms and POST is for direct from HTML page use.

            Thanks for the suggestion about "query(String)". As a personal preference, I don't like long methods and tend to pull out the stages into nearby private methods to make new "verbs" for the process

            query(String) can be renamed performQuery(String) because it is clearer as an action name..

            And move the building steps into a makeRequest so performQuery is shorter.

            We can see what that looks like.

            andy Andy Seaborne added a comment - Adding to params could have been done earlier (if not asPost ) and then removed if there is a change of plan. The calculation of length can done without actually making the query string. I'm not sure what the general practice is for triple stores and SPARQL libraries - I would not be surprised if it is common to just have POST+body and just be done with it. Use with HTML forms and POST is for direct from HTML page use. Thanks for the suggestion about "query(String)". As a personal preference, I don't like long methods and tend to pull out the stages into nearby private methods to make new "verbs" for the process query(String) can be renamed performQuery(String) because it is clearer as an action name.. And move the building steps into a makeRequest so performQuery is shorter. We can see what that looks like.

            The calculation of length can done without actually making the query string.

            How to consider length changes due to URL encoding then?

            jmkeil Jan Martin Keil added a comment - The calculation of length can done without actually making the query string. How to consider length changes due to URL encoding then?
            andy Andy Seaborne added a comment -

            Count the number of characters (not codepoints!) that need encoding without producing the string. The length is "string length + 2*number of characters".

            Except java.net.http also gets involved - it does the unicode->ASCII.

            It would be possible to handle 414 internally and retry, though that might end up in the situation where many requests do a double network operation.

            I believe that nowadays the practical limit is 4K - it used to be 1K - due to proxies. Some proxies truncate the URI (414 is a server response not a proxy transfer response).

            So maybe the best thing is to set the limit lower like1K or 512, and use that to allow for encoding. There isn't a perfect answer as far as I am aware.

            GET with a query string isn't cached by proxies.

            The bug needs fixing (in-progress). Being near to release, better handling might have to wait.

             

            andy Andy Seaborne added a comment - Count the number of characters (not codepoints!) that need encoding without producing the string. The length is "string length + 2*number of characters". Except java.net.http also gets involved - it does the unicode->ASCII. It would be possible to handle 414 internally and retry, though that might end up in the situation where many requests do a double network operation. I believe that nowadays the practical limit is 4K - it used to be 1K - due to proxies. Some proxies truncate the URI (414 is a server response not a proxy transfer response). So maybe the best thing is to set the limit lower like1K or 512, and use that to allow for encoding. There isn't a perfect answer as far as I am aware. GET with a query string isn't cached by proxies. The bug needs fixing (in-progress). Being near to release, better handling might have to wait.  

            Commit 9a40fb8e2d36098cbae533650077c44db0791811 in jena's branch refs/heads/main from Andy Seaborne
            [ https://gitbox.apache.org/repos/asf?p=jena.git;h=9a40fb8 ]

            JENA-2257: Include query string in URI limit test

            jira-bot ASF subversion and git services added a comment - Commit 9a40fb8e2d36098cbae533650077c44db0791811 in jena's branch refs/heads/main from Andy Seaborne [ https://gitbox.apache.org/repos/asf?p=jena.git;h=9a40fb8 ] JENA-2257 : Include query string in URI limit test

            Commit e7a192c60110d59e877104db556070820839ae2d in jena's branch refs/heads/main from Andy Seaborne
            [ https://gitbox.apache.org/repos/asf?p=jena.git;h=e7a192c ]

            Merge pull request #1171 from afs/url-limit

            JENA-2257: Include query string in URI limit test

            jira-bot ASF subversion and git services added a comment - Commit e7a192c60110d59e877104db556070820839ae2d in jena's branch refs/heads/main from Andy Seaborne [ https://gitbox.apache.org/repos/asf?p=jena.git;h=e7a192c ] Merge pull request #1171 from afs/url-limit JENA-2257 : Include query string in URI limit test

            Thanks a lot !

            jmkeil Jan Martin Keil added a comment - Thanks a lot !

            People

              andy Andy Seaborne
              jmkeil Jan Martin Keil
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: