XML-RPC
  1. XML-RPC
  2. XMLRPC-153

content-length header incorrect when using gzip

    Details

    • Type: Bug Bug
    • Status: Open
    • Priority: Major Major
    • Resolution: Unresolved
    • Affects Version/s: 3.0, 3.1
    • Fix Version/s: None
    • Component/s: None
    • Labels:
      None
    • Environment:
      UNIX (FC3), Sun JDK1.5.0_10

      Description

      When doing some testing using the ws-xmlrpc client libraries I ran across a bug in its calculation of the content-length HTTP header when using gzip compression but not HTTP chunked transfer. The client incorrectly sets the content-length to the length of the uncompressed data, rather than the compressed data it sends. This happens using both 3.0 and 3.1 client libraries.

      I see some activity on ws-xmlrpc-dev from September 2007 but no mention of any resolution. I did a quick bug search and found nothing - my apologies if this is already being tracked somewhere else and I missed it.

      From the mail thread, a link to the relevant part of the HTTP spec:

      http://www.w3.org/Protocols/rfc2616/rfc2616-sec14.html#sec14.13

      1. patch.txt
        3 kB
        Balázs Póka

        Activity

        Hide
        Balázs Póka added a comment - - edited

        Reading http://www.w3.org/Protocols/rfc2616/rfc2616-sec14.html over and over again, I think I'm beginning to have an impression what all of this means.

        First of all, a quick recap of the relevant definitions.

        In section 14.11, it says: "The Content-Encoding entity-header field is used as a modifier to the media-type. When present, its value indicates what additional content codings have been applied to the entity-body, and thus what decoding mechanisms must be applied in order to obtain the media-type referenced by the Content-Type header field. Content-Encoding is primarily used to allow a document to be compressed without losing the identity of its underlying media type."
        -> "Content codings are defined in section 3.5."

        Section 3.5 states: "Content coding values indicate an encoding transformation that has been or can be applied to an entity. Content codings are primarily used to allow a document to be compressed or otherwise usefully transformed without losing the identity of its underlying media type and without loss of information. Frequently, the entity is stored in coded form, transmitted directly, and only decoded by the recipient."

        Section 14.41: "The Transfer-Encoding general-header field indicates what (if any) type of transformation has been applied to the message body in order to safely transfer it between the sender and the recipient. This differs from the content-coding in that the transfer-coding is a property of the message, not of the entity."
        -> "Transfer-codings are defined in section 3.6."

        Section 3.6 states: " Transfer-coding values are used to indicate an encoding transformation that has been, can be, or may need to be applied to an entity-body in order to ensure "safe transport" through the network. This differs from a content coding in that the transfer-coding is a property of the message, not of the original entity."

        It is declared that the set of supported Content-codings ("identity", "gzip", "compress", "deflate") is actually a subset of available Transfer-codings ("chunked", "identity", "gzip", "compress", "deflate") and that "Transfer-codings are analogous to the Content-Transfer-Encoding values of MIME..."

        Let's consider an example where both fields are relevant. Say we have some URI whose content a browser is able to display directly. The browser decides whether it can do that based on the mime type. That could be an image, a flash application, a simple html or xml document, whatever. Suppose this document is compressible so it makes sense to compress it. Using the vocabulary of the RFC, we modify its media type, which is specified to be "text/xml", using gzip Content-encoding. Now the entity the browser downloads from the URI has a property of having been filtered through gzip. But underneath that it's still "text/xml" so the browser can still use it after applying reverse transformations. Suppose now there is a proxy between the server and the browser. Some proxies don't handle missing Content-Length headers too well. In that case, if it's HTTP/1.1 compatible, "chunked" Transfer-encoding may be used.

        There is a section regarding Message length (4.4), which helps to understand that content-encoding is done in an other layer than transfer-encoding, and is totally unrelated.

        " The transfer-length of a message is the length of the message-body as it appears in the message; that is, after any transfer-codings have been applied. When a message-body is included with a message, the transfer-length of that body is determined by one of the following (in order of precedence):
        ...
        2. If a Transfer-Encoding header field (section 14.41) is present and has any value other than "identity", then the transfer-length is defined by use of the "chunked" transfer-coding (section 3.6), unless the message is terminated by closing the connection. [Meaning that a chunked transfer-encoding implicitly specifies the total message length, which is irrelevant here since we don't use chunked transfers.]

        3. If a Content-Length header field (section 14.13) is present, its decimal value in OCTETs represents both the entity-length and the transfer-length. The Content-Length header field MUST NOT be sent if these two lengths are different (i.e., if a Transfer-Encoding header field is present). If a message is received with both a Transfer-Encoding header field and a Content-Length header field, the latter MUST be ignored. [So the Content-Length field should EXACTLY match the number of bytes transferred. Since there is no Transfer-Encoding header (no chunked or anything), this is the key piece of information.]
        ...
        5. By the server closing the connection. (Closing the connection cannot be used to indicate the end of a request body, since that would leave no possibility for the server to send back a response.) [Very important since this is what causes the problem in the first place.]
        "

        So, I think that the following two examples of headers are functionally equivalent, but only the first is supported by HTTP/1.0:
        1)
        Content-Encoding: gzip
        Content-Length: 1234

        [Content-Length is the exact number of bytes sent over the wire.]

        2) Transfer-Encoding: gzip; chunked

        [chunked is mandatory if there is a Transfer-Encoding header, and no Content-Length is needed since the size of the message can be calculated because of the chunked transfer-encoding. This is why it must be ignored.]

        Thanks for reading though this. Hope this helps.

        Show
        Balázs Póka added a comment - - edited Reading http://www.w3.org/Protocols/rfc2616/rfc2616-sec14.html over and over again, I think I'm beginning to have an impression what all of this means. First of all, a quick recap of the relevant definitions. In section 14.11, it says: "The Content-Encoding entity-header field is used as a modifier to the media-type. When present, its value indicates what additional content codings have been applied to the entity-body, and thus what decoding mechanisms must be applied in order to obtain the media-type referenced by the Content-Type header field. Content-Encoding is primarily used to allow a document to be compressed without losing the identity of its underlying media type." -> "Content codings are defined in section 3.5." Section 3.5 states: "Content coding values indicate an encoding transformation that has been or can be applied to an entity. Content codings are primarily used to allow a document to be compressed or otherwise usefully transformed without losing the identity of its underlying media type and without loss of information. Frequently, the entity is stored in coded form, transmitted directly, and only decoded by the recipient." Section 14.41: "The Transfer-Encoding general-header field indicates what (if any) type of transformation has been applied to the message body in order to safely transfer it between the sender and the recipient. This differs from the content-coding in that the transfer-coding is a property of the message, not of the entity." -> "Transfer-codings are defined in section 3.6." Section 3.6 states: " Transfer-coding values are used to indicate an encoding transformation that has been, can be, or may need to be applied to an entity-body in order to ensure "safe transport" through the network. This differs from a content coding in that the transfer-coding is a property of the message, not of the original entity." It is declared that the set of supported Content-codings ("identity", "gzip", "compress", "deflate") is actually a subset of available Transfer-codings ("chunked", "identity", "gzip", "compress", "deflate") and that "Transfer-codings are analogous to the Content-Transfer-Encoding values of MIME..." Let's consider an example where both fields are relevant. Say we have some URI whose content a browser is able to display directly. The browser decides whether it can do that based on the mime type. That could be an image, a flash application, a simple html or xml document, whatever. Suppose this document is compressible so it makes sense to compress it. Using the vocabulary of the RFC, we modify its media type, which is specified to be "text/xml", using gzip Content-encoding. Now the entity the browser downloads from the URI has a property of having been filtered through gzip. But underneath that it's still "text/xml" so the browser can still use it after applying reverse transformations. Suppose now there is a proxy between the server and the browser. Some proxies don't handle missing Content-Length headers too well. In that case, if it's HTTP/1.1 compatible, "chunked" Transfer-encoding may be used. There is a section regarding Message length (4.4), which helps to understand that content-encoding is done in an other layer than transfer-encoding, and is totally unrelated. " The transfer-length of a message is the length of the message-body as it appears in the message; that is, after any transfer-codings have been applied. When a message-body is included with a message, the transfer-length of that body is determined by one of the following (in order of precedence): ... 2. If a Transfer-Encoding header field (section 14.41) is present and has any value other than "identity", then the transfer-length is defined by use of the "chunked" transfer-coding (section 3.6), unless the message is terminated by closing the connection. [Meaning that a chunked transfer-encoding implicitly specifies the total message length, which is irrelevant here since we don't use chunked transfers.] 3. If a Content-Length header field (section 14.13) is present, its decimal value in OCTETs represents both the entity-length and the transfer-length. The Content-Length header field MUST NOT be sent if these two lengths are different (i.e., if a Transfer-Encoding header field is present). If a message is received with both a Transfer-Encoding header field and a Content-Length header field, the latter MUST be ignored. [So the Content-Length field should EXACTLY match the number of bytes transferred. Since there is no Transfer-Encoding header (no chunked or anything), this is the key piece of information.] ... 5. By the server closing the connection. (Closing the connection cannot be used to indicate the end of a request body, since that would leave no possibility for the server to send back a response.) [Very important since this is what causes the problem in the first place.] " So, I think that the following two examples of headers are functionally equivalent, but only the first is supported by HTTP/1.0: 1) Content-Encoding: gzip Content-Length: 1234 [Content-Length is the exact number of bytes sent over the wire.] 2) Transfer-Encoding: gzip; chunked [chunked is mandatory if there is a Transfer-Encoding header, and no Content-Length is needed since the size of the message can be calculated because of the chunked transfer-encoding. This is why it must be ignored.] Thanks for reading though this. Hope this helps.
        Hide
        Andy Meyer added a comment -

        The content-length should definitely reflect the actual length of the message sent over the wire, ie the content after compression is applied. Without it the HTTP protocol can't decode message boundaries over a persistent connection when not using chunked transfer coding.

        This has also bitten other projects, see eg:

        http://bugs.php.net/bug.php?id=24083
        http://httpd.apache.org/docs/2.2/mod/mod_deflate.html, the note about not trusting content-length - in that case the value of the content-length header is left alone and reflects the length of the compressed content.

        Show
        Andy Meyer added a comment - The content-length should definitely reflect the actual length of the message sent over the wire, ie the content after compression is applied. Without it the HTTP protocol can't decode message boundaries over a persistent connection when not using chunked transfer coding. This has also bitten other projects, see eg: http://bugs.php.net/bug.php?id=24083 http://httpd.apache.org/docs/2.2/mod/mod_deflate.html , the note about not trusting content-length - in that case the value of the content-length header is left alone and reflects the length of the compressed content.
        Hide
        Jochen Wiedmann added a comment -

        The root problem remains the same: In all of the discussions regarding this issue noone came up with a clear resolution how or whether transport-encoding and content-length relate. As long as this is the case, I do not intend to change the current behaviour.

        Show
        Jochen Wiedmann added a comment - The root problem remains the same: In all of the discussions regarding this issue noone came up with a clear resolution how or whether transport-encoding and content-length relate. As long as this is the case, I do not intend to change the current behaviour.
        Hide
        Balázs Póka added a comment -

        If I'm not mistaken, the old email thread this issue was being discussed on was initiated by me. I continued to pursue the issue at that time, but got no reply to my last one or two messages. Now I've made a patch which fixes this problem since I ran into it again.

        Show
        Balázs Póka added a comment - If I'm not mistaken, the old email thread this issue was being discussed on was initiated by me. I continued to pursue the issue at that time, but got no reply to my last one or two messages. Now I've made a patch which fixes this problem since I ran into it again.

          People

          • Assignee:
            Unassigned
            Reporter:
            Andy Meyer
          • Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

            • Created:
              Updated:

              Development