Solr
  1. Solr
  2. SOLR-5517

Return HTTP error on POST requests with no Content-Type

    Details

    • Type: Improvement Improvement
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 4.7, 5.0
    • Component/s: None
    • Labels:
      None

      Description

      While the http spec states requests without a content-type should be treated as application/octet-stream, the html spec says instead that post requests without a content-type should be treated as a form (http://www.w3.org/MarkUp/html-spec/html-spec_8.html#SEC8.2.1). It would be nice to allow large search requests from html forms, and not have to rely on the browser to set the content type (since the spec says it doesn't have to).

      1. SOLR-5517.patch
        4 kB
        Ryan Ernst
      2. SOLR-5517.patch
        4 kB
        Ryan Ernst
      3. SOLR-5517.patch
        6 kB
        Uwe Schindler
      4. SOLR-5517.patch
        7 kB
        Uwe Schindler
      5. SOLR-5517.patch
        7 kB
        Uwe Schindler

        Activity

        Hide
        Robert Muir added a comment -

        +1

        Show
        Robert Muir added a comment - +1
        Hide
        Uwe Schindler added a comment -

        -1, I don't like this automatisms. Especially as the specs don't agree (HTML vs. HTTP).
        Just the question: Which browser, not older than 15 years, still does ot send a content type?

        Uwe

        Show
        Uwe Schindler added a comment - -1, I don't like this automatisms. Especially as the specs don't agree (HTML vs. HTTP). Just the question: Which browser, not older than 15 years, still does ot send a content type? Uwe
        Hide
        Robert Muir added a comment -

        I disagree that the http spec says unknown content-type should be application/octet stream. It does not really say this, you have to read it all in context.

        HTTP: (discussing content-type in general, and not POST in particular)

        If and only if the media type is not given by a Content-Type field, the recipient MAY attempt to guess the media type via inspection of its content and/or the name extension(s) of the URL used to identify the resource. If the media type remains unknown, the recipient SHOULD treat it as type "application/octet-stream"

        Show
        Robert Muir added a comment - I disagree that the http spec says unknown content-type should be application/octet stream. It does not really say this, you have to read it all in context. HTTP: (discussing content-type in general, and not POST in particular) If and only if the media type is not given by a Content-Type field, the recipient MAY attempt to guess the media type via inspection of its content and/or the name extension(s) of the URL used to identify the resource. If the media type remains unknown , the recipient SHOULD treat it as type "application/octet-stream"
        Hide
        Uwe Schindler added a comment -

        As said, I disagree with this change for the same reasons why something like guessing the charset is wrong, too. This is my personal opinion, you may or may not share it. I just add: Browsers still not sending the content type, for sure also send the stuff ISO-8859-1 URL-encoded, which would be not accepted by Solr, too.

        My question was: Which browsers don't support this?

        Show
        Uwe Schindler added a comment - As said, I disagree with this change for the same reasons why something like guessing the charset is wrong, too. This is my personal opinion, you may or may not share it. I just add: Browsers still not sending the content type, for sure also send the stuff ISO-8859-1 URL-encoded, which would be not accepted by Solr, too. My question was: Which browsers don't support this?
        Hide
        Jack Krupansky added a comment -

        What about curl commands? It is kind of an annoyance that you have to explicitly enter a Content-type.

        Show
        Jack Krupansky added a comment - What about curl commands? It is kind of an annoyance that you have to explicitly enter a Content-type.
        Hide
        Uwe Schindler added a comment -

        In my opinion, we should for this case not fallback to something like the crazy RAW parser (I agree, this is wrong and leads to confusion). Instead we should return a better HTTP status like:

        415 Unsupported Media Type
        The request entity has a media type which the server or resource does not support.[2] For example, the client uploads an image as image/svg+xml, but the server requires that images use a different format.

        This can be done by throwing SolrException with 415 as status code.

        Show
        Uwe Schindler added a comment - In my opinion, we should for this case not fallback to something like the crazy RAW parser (I agree, this is wrong and leads to confusion). Instead we should return a better HTTP status like: 415 Unsupported Media Type The request entity has a media type which the server or resource does not support. [2] For example, the client uploads an image as image/svg+xml, but the server requires that images use a different format. This can be done by throwing SolrException with 415 as status code.
        Hide
        Robert Muir added a comment -

        The problem can happen when users write HTTP code themselves, for example if they are using httpclient and use StringEntity(URLEncodedUtils(...)) versus URLEncodedFormEntity. They are the same, except the latter sets Content-Type to form data.

        The current "default" when content-type is not specified returns xml with no results, which can be confusing. All the solr POST examples (e.g. upload) on the wiki tell you that you should send this header. If its missing, returning 415 Unsupported Media Type or some similar error would be another option: returning no results is hard to debug.

        Show
        Robert Muir added a comment - The problem can happen when users write HTTP code themselves, for example if they are using httpclient and use StringEntity(URLEncodedUtils(...)) versus URLEncodedFormEntity. They are the same, except the latter sets Content-Type to form data. The current "default" when content-type is not specified returns xml with no results, which can be confusing. All the solr POST examples (e.g. upload) on the wiki tell you that you should send this header. If its missing, returning 415 Unsupported Media Type or some similar error would be another option: returning no results is hard to debug.
        Hide
        Robert Muir added a comment -

        What about curl commands? It is kind of an annoyance that you have to explicitly enter a Content-type.

        Curl automatically populates this header itself (unless you change it explicitly to something else), so its no problem. Again: the problem is someone writing their own client code, they can send exact same URL/params as curl, but instead just get no results: which is really confusing. All of their parameters are silently ignored.

        Show
        Robert Muir added a comment - What about curl commands? It is kind of an annoyance that you have to explicitly enter a Content-type. Curl automatically populates this header itself (unless you change it explicitly to something else), so its no problem. Again: the problem is someone writing their own client code, they can send exact same URL/params as curl, but instead just get no results: which is really confusing. All of their parameters are silently ignored.
        Hide
        Uwe Schindler added a comment -

        Hi,

        I looked up what Jetty by default is doing, here is it: https://github.com/eclipse/jetty.project/blob/e92f44ed73453d6107a30ba42600787963c6243d/jetty-server/src/main/java/org/eclipse/jetty/server/Request.java#L288

        Actually it only parses the stuff if a content-type is set and the method is POST/PUT. So our code is doing the same (because we dont use Jetty's parameter decoding, as its not felxible enough with charsets).

        Show
        Uwe Schindler added a comment - Hi, I looked up what Jetty by default is doing, here is it: https://github.com/eclipse/jetty.project/blob/e92f44ed73453d6107a30ba42600787963c6243d/jetty-server/src/main/java/org/eclipse/jetty/server/Request.java#L288 Actually it only parses the stuff if a content-type is set and the method is POST/PUT. So our code is doing the same (because we dont use Jetty's parameter decoding, as its not felxible enough with charsets).
        Hide
        Ryan Ernst added a comment -

        In my opinion, we should for this case not fallback to something like the crazy RAW parser (I agree, this is wrong and leads to confusion). Instead we should return a better HTTP status like:
        415 Unsupported Media Type

        I like this idea. I only want to make sure the example Robert Muir gave does not result in a confusing 200 with empty results.

        Here is a new patch returning 415 for a post missing content type.

        Show
        Ryan Ernst added a comment - In my opinion, we should for this case not fallback to something like the crazy RAW parser (I agree, this is wrong and leads to confusion). Instead we should return a better HTTP status like: 415 Unsupported Media Type I like this idea. I only want to make sure the example Robert Muir gave does not result in a confusing 200 with empty results. Here is a new patch returning 415 for a post missing content type.
        Hide
        Uwe Schindler added a comment -

        That looks perfect. I have not yet tested it, but from what I see: the RAW parser will no longer automatically invoked with missing content type.

        There may be another bug in the raw parser, but that is another issue: The raw parser seems to ignore request parameters given in the URL query string.

        Show
        Uwe Schindler added a comment - That looks perfect. I have not yet tested it, but from what I see: the RAW parser will no longer automatically invoked with missing content type. There may be another bug in the raw parser, but that is another issue: The raw parser seems to ignore request parameters given in the URL query string.
        Hide
        Uwe Schindler added a comment -

        I checked the rawrequest parser: It adds the query string to the content. I think the bigger problem why the whole request posted by Robert did not work is the fact that by using the raw request parser, the /select handler cannot handle that. The raw request parser is only useful for update request handlers.

        On the other hand: if you send some crazy non-null content type to something else than the update request handlers, they may produce the same strange results (maybe if you have a typo in the "application/x-formdata-foo-bar" stuff). The /select request handler should (in my opinion) check some request properties and throw an error if they don't fit the actual request.

        Show
        Uwe Schindler added a comment - I checked the rawrequest parser: It adds the query string to the content. I think the bigger problem why the whole request posted by Robert did not work is the fact that by using the raw request parser, the /select handler cannot handle that. The raw request parser is only useful for update request handlers. On the other hand: if you send some crazy non-null content type to something else than the update request handlers, they may produce the same strange results (maybe if you have a typo in the "application/x-formdata-foo-bar" stuff). The /select request handler should (in my opinion) check some request properties and throw an error if they don't fit the actual request.
        Hide
        Uwe Schindler added a comment -

        We should not allow empty content-type for any of the request parsers. The current patch only changes the StandardRequestParser. But also RawRequestParser should complain about null content type. Also stuff like MultiPart should require a Content-Type. The FormDataRequestParser is the only one correctly complaining. The MultiPart one only complains if the whole request is not a MultiPart one. But if one of the file items has no content type, it lets this through. This would also prevent possible NPEs in the RequestHandlers, because not all of them correctly check for non-null.

        Show
        Uwe Schindler added a comment - We should not allow empty content-type for any of the request parsers. The current patch only changes the StandardRequestParser. But also RawRequestParser should complain about null content type. Also stuff like MultiPart should require a Content-Type. The FormDataRequestParser is the only one correctly complaining. The MultiPart one only complains if the whole request is not a MultiPart one. But if one of the file items has no content type, it lets this through. This would also prevent possible NPEs in the RequestHandlers, because not all of them correctly check for non-null.
        Hide
        Uwe Schindler added a comment - - edited

        I looked around in the RequestHandlers:

        • SearchRequestHandler is the bad guy, which does not throw SolrException, if the request does not fit a "search". The SearchHandler should throw SolrException, if request.streams.length>0. For a search request, there should be not passed any streams. If you then have a typo in your content-type, the request handler has to throw the exception, as no streams are expected (the stream comes from the following: because the FormData parser is not invoked, the RawRequestParser then sets a single ContentStream for any remaining content type - in that case the one with typo).
        • UpdateRequestHandler should use the new HTTP status constant here to be consistent:
                if( type == null ) { // Normal requests will not get here.
                  throw new SolrException(ErrorCode.BAD_REQUEST, "Missing ContentType");
                }
                int idx = type.indexOf(';');
                if(idx>0) {
                  type = type.substring(0,idx);
                }
                ContentStreamLoader loader = loaders.get(type);
                if(loader==null) {
                  throw new SolrException(ErrorCode.BAD_REQUEST, "Unsupported ContentType: "
                      +type+ "  Not in: "+loaders.keySet());
                }
                if(loader.getDefaultWT()!=null) {
                  setDefaultWT(req,loader);
                }
                loader.load(req, rsp, stream, processor);
          

          We should change that code to send the correct HTTP status (or change the status in the current patch to be BAD_REQUEST).

        Show
        Uwe Schindler added a comment - - edited I looked around in the RequestHandlers: SearchRequestHandler is the bad guy, which does not throw SolrException, if the request does not fit a "search". The SearchHandler should throw SolrException, if request.streams.length>0. For a search request, there should be not passed any streams. If you then have a typo in your content-type, the request handler has to throw the exception, as no streams are expected (the stream comes from the following: because the FormData parser is not invoked, the RawRequestParser then sets a single ContentStream for any remaining content type - in that case the one with typo). UpdateRequestHandler should use the new HTTP status constant here to be consistent: if ( type == null ) { // Normal requests will not get here. throw new SolrException(ErrorCode.BAD_REQUEST, "Missing ContentType" ); } int idx = type.indexOf(';'); if (idx>0) { type = type.substring(0,idx); } ContentStreamLoader loader = loaders.get(type); if (loader== null ) { throw new SolrException(ErrorCode.BAD_REQUEST, "Unsupported ContentType: " +type+ " Not in: " +loaders.keySet()); } if (loader.getDefaultWT()!= null ) { setDefaultWT(req,loader); } loader.load(req, rsp, stream, processor); We should change that code to send the correct HTTP status (or change the status in the current patch to be BAD_REQUEST).
        Hide
        Uwe Schindler added a comment -

        Here is a new patch that adds:

        • UpdateRequestHandler now throws consistent HTTP status code 415 on unknown content type
        • SearchHandler throws BAD_REQUEST, if the incomming request has content streams. Currently one test is failing because of this check, I have to figure out why this happens, maybe a test bug - the failing test TestRemoteStreams seems to do some fake requests to default select handler and passes some remote streams. Of course SearchHandler now fails on this; this is incorrect usage.
        Show
        Uwe Schindler added a comment - Here is a new patch that adds: UpdateRequestHandler now throws consistent HTTP status code 415 on unknown content type SearchHandler throws BAD_REQUEST, if the incomming request has content streams. Currently one test is failing because of this check, I have to figure out why this happens, maybe a test bug - the failing test TestRemoteStreams seems to do some fake requests to default select handler and passes some remote streams. Of course SearchHandler now fails on this; this is incorrect usage.
        Hide
        Uwe Schindler added a comment -

        This patch removes the failing test. This one is just bogus!

        Show
        Uwe Schindler added a comment - This patch removes the failing test. This one is just bogus!
        Hide
        Uwe Schindler added a comment -

        Instead of deleting the test, I changed it to pass if the correct error code is sent.

        Show
        Uwe Schindler added a comment - Instead of deleting the test, I changed it to pass if the correct error code is sent.
        Hide
        Ryan Ernst added a comment -

        Thanks for the additional checks Uwe. I'll commit shortly.

        Show
        Ryan Ernst added a comment - Thanks for the additional checks Uwe. I'll commit shortly.
        Hide
        Uwe Schindler added a comment -

        +1

        Show
        Uwe Schindler added a comment - +1
        Hide
        ASF subversion and git services added a comment -

        Commit 1547322 from Ryan Ernst in branch 'dev/trunk'
        [ https://svn.apache.org/r1547322 ]

        SOLR-5517: Return HTTP error on POST requests with no Content-Type

        Show
        ASF subversion and git services added a comment - Commit 1547322 from Ryan Ernst in branch 'dev/trunk' [ https://svn.apache.org/r1547322 ] SOLR-5517 : Return HTTP error on POST requests with no Content-Type
        Hide
        ASF subversion and git services added a comment -

        Commit 1547467 from Ryan Ernst in branch 'dev/branches/branch_4x'
        [ https://svn.apache.org/r1547467 ]

        SOLR-5517: Return HTTP error on POST requests with no Content-Type

        Show
        ASF subversion and git services added a comment - Commit 1547467 from Ryan Ernst in branch 'dev/branches/branch_4x' [ https://svn.apache.org/r1547467 ] SOLR-5517 : Return HTTP error on POST requests with no Content-Type
        Hide
        Paco Garcia added a comment -

        Hi,
        maybe that with this change you can´t abort anymore a dataimport from the Admin UI?

        I allways get a response of 415 HTTP Code to the internal jquery invocation to abort
        POST http://xxxx:7070/solr/collection1/dataimport?command=abort&wt=json

        415 (Tipo de Medio No Soportado)

        require.js?_=4.7.0:10185
        send require.js?_=4.7.0:10185
        jQuery.extend.ajax require.js?_=4.7.0:9663
        (anonymous function) dataimport.js?_=4.7.0:348
        jQuery.event.dispatch require.js?_=4.7.0:5336
        elemData.handle.eventHandle

        It works with the abort command directly in the browser URL.

        Regards

        Show
        Paco Garcia added a comment - Hi, maybe that with this change you can´t abort anymore a dataimport from the Admin UI? I allways get a response of 415 HTTP Code to the internal jquery invocation to abort POST http://xxxx:7070/solr/collection1/dataimport?command=abort&wt=json 415 (Tipo de Medio No Soportado) require.js?_=4.7.0:10185 send require.js?_=4.7.0:10185 jQuery.extend.ajax require.js?_=4.7.0:9663 (anonymous function) dataimport.js?_=4.7.0:348 jQuery.event.dispatch require.js?_=4.7.0:5336 elemData.handle.eventHandle It works with the abort command directly in the browser URL. Regards
        Hide
        Paco Garcia added a comment - - edited

        Quick hack:
        go to:
        webapps\solr\js\scripts\dataimport.js
        and change POST by GET :

        $.ajax
        (
        {
        url : handler_url + '?command=abort&wt=json',
        dataType : 'json',
        type: 'GET', <<<<<<<<<<<<<<<<<<<<<<<<<<<<<
        context: $( this ),
        beforeSend : function( xhr, settings )

        { span_element .addClass( 'loader' ); }

        ,
        OR put something inside the data like:

        $.ajax
        (
        {
        url : handler_url + '?command=abort&wt=json',
        data :

        { indent : 'true' }

        ,
        dataType : 'json',
        type: 'POST',
        context: $( this ),
        beforeSend : function( xhr, settings )

        Regards

        Show
        Paco Garcia added a comment - - edited Quick hack: go to: webapps\solr\js\scripts\dataimport.js and change POST by GET : $.ajax ( { url : handler_url + '?command=abort&wt=json', dataType : 'json', type: 'GET', <<<<<<<<<<<<<<<<<<<<<<<<<<<<< context: $( this ), beforeSend : function( xhr, settings ) { span_element .addClass( 'loader' ); } , OR put something inside the data like: $.ajax ( { url : handler_url + '?command=abort&wt=json', data : { indent : 'true' } , dataType : 'json', type: 'POST', context: $( this ), beforeSend : function( xhr, settings ) Regards
        Hide
        Uwe Schindler added a comment - - edited

        Hi,
        can you open a new issue to fix the Admin UI? The admin UI may use POST, but it must send content-type. I think Stefan Matheis (steffkes) should fix this, by passing the "application/json" content type.
        Ideally this should (as you tell) send something in the body, otherwise POST is not the right HTTP method to use.

        Show
        Uwe Schindler added a comment - - edited Hi, can you open a new issue to fix the Admin UI? The admin UI may use POST, but it must send content-type. I think Stefan Matheis (steffkes) should fix this, by passing the "application/json" content type. Ideally this should (as you tell) send something in the body, otherwise POST is not the right HTTP method to use.
        Hide
        Paco Garcia added a comment -
        Show
        Paco Garcia added a comment - OK SOLR-5847

          People

          • Assignee:
            Ryan Ernst
            Reporter:
            Ryan Ernst
          • Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development