mod_python
  1. mod_python
  2. MODPYTHON-222

Support for chunked transfer encoding on request content.

    Details

    • Type: New Feature New Feature
    • Status: Open
    • Priority: Major Major
    • Resolution: Unresolved
    • Affects Version/s: 3.3.1
    • Fix Version/s: None
    • Component/s: core
    • Labels:
      None

      Description

      It is currently not possible to use chunked transfer encoding on request content delivered to a mod_python request handler.

      The use of chunked transfer encoding is explicitly blocked in C code by:

      rc = ap_setup_client_block(self->request_rec, REQUEST_CHUNKED_ERROR);

      To allow chunked transfer encoding instead of REQUEST_CHUNKED_ERROR it would be necessary to supply REQUEST_CHUNKED_DECHUNK.

      Problem is that it isn't that simple.

      First off, the problems associated with MODPYTHON-212 have to be fixed with code being able to cope with there being no content length.

      The next issue is that req.read() method is currently documented as behaving as:

      If the len argument is negative or omitted, reads all data given by the client.

      This means that can't have req.read() with no arguments mean give me everything that is currently available in input buffers as everyone currently expects it to return everything sent by client. Thus, to be able to process streaming data one would have to supply an amount of data that one wants to read. The code for that though will always try to ensure that that exact amount of data is read and will block if not enough and not end of input. A handler though may not want it to block and be happy with just getting what is read and only expect it to block if nothing currently available.

      In other words, the current specification for how req.read() behaves is incompatible with what would be required to support chunked transfer encoding on request content.

      Not sure how this conflict can be resolved.

      1. mod_python-211-212-222-fix-revised.patch
        24 kB
        Alex Cichowski
      2. requestobject.c
        70 kB
        Alex Cichowski

        Activity

        Hide
        Graham Dumpleton added a comment -

        This was discussed on mod_python developers list and patches still don't necessarily address all concerns. See thread:

        http://markmail.org/message/nylwtkzz2t3nhi46

        Show
        Graham Dumpleton added a comment - This was discussed on mod_python developers list and patches still don't necessarily address all concerns. See thread: http://markmail.org/message/nylwtkzz2t3nhi46
        Hide
        Michael Barton added a comment -

        I'd love to see this issue fixed in some manner. Chunked transfer would be useful for stuff like streaming the output of gzip to the server rather than spooling to disk locally to get the length first.

        I don't know enough about mod_python to properly evaluate the above patch, but it's been working for me.

        Show
        Michael Barton added a comment - I'd love to see this issue fixed in some manner. Chunked transfer would be useful for stuff like streaming the output of gzip to the server rather than spooling to disk locally to get the length first. I don't know enough about mod_python to properly evaluate the above patch, but it's been working for me.
        Hide
        Alex Cichowski added a comment -

        The attached patch should fix up this issue, as well as MODPYTHON-211 (readlines() leak), MODPYTHON-212 (read() with no arguments bug), and possibly MODPYTHON-234 (read() SystemError).

        Changes included:

        (1) Make read() and readline() ignore the content length in the case where no size argument is specified, and instead call ap_get_client_block() in a loop until it returns zero. The result buffer is dynamically expanded as necessary with _PyString_Resize().

        (2) Add a request.set_read_policy() call to specify whether REQUEST_NO_BODY, REQUEST_CHUNKED_ERROR or REQUEST_CHUNKED_DECHUNK is passed to ap_setup_client_block() on the first read()/readline() call. The default is REQUEST_CHUNKED_DECHUNK, but could be set to REQUEST_CHUNKED_ERROR for compatibility with the current behaviour.

        (3) Rewrite readlines() to be more rigorous in error handling and memory management.

        (4) Add a new variation on read() called read_partial(), which attempts to ensure that it has returned all buffered data before blocking, by returning less than the requested amount where necessary. This is helpful to avoid deadlock when you would like to have the server read and respond to part of the client's request before the client sends the next part. I don't know whether it is strictly valid to use HTTP in this way or not, but I think it's nice to have this feature available anyway.

        This patch is against the mod_python-3.3.1 release.

        I have also attached an updated copy of requestobject.c for easy viewing without applying the patch.

        Show
        Alex Cichowski added a comment - The attached patch should fix up this issue, as well as MODPYTHON-211 (readlines() leak), MODPYTHON-212 (read() with no arguments bug), and possibly MODPYTHON-234 (read() SystemError). Changes included: (1) Make read() and readline() ignore the content length in the case where no size argument is specified, and instead call ap_get_client_block() in a loop until it returns zero. The result buffer is dynamically expanded as necessary with _PyString_Resize(). (2) Add a request.set_read_policy() call to specify whether REQUEST_NO_BODY, REQUEST_CHUNKED_ERROR or REQUEST_CHUNKED_DECHUNK is passed to ap_setup_client_block() on the first read()/readline() call. The default is REQUEST_CHUNKED_DECHUNK, but could be set to REQUEST_CHUNKED_ERROR for compatibility with the current behaviour. (3) Rewrite readlines() to be more rigorous in error handling and memory management. (4) Add a new variation on read() called read_partial(), which attempts to ensure that it has returned all buffered data before blocking, by returning less than the requested amount where necessary. This is helpful to avoid deadlock when you would like to have the server read and respond to part of the client's request before the client sends the next part. I don't know whether it is strictly valid to use HTTP in this way or not, but I think it's nice to have this feature available anyway. This patch is against the mod_python-3.3.1 release. I have also attached an updated copy of requestobject.c for easy viewing without applying the patch.
        Hide
        Mike Looijmans added a comment -

        For my tape project I used "None" as valid len argument to indicate reading the next available tape
        block, regardless of size (underlying layer figured out the tape block size). This worked well
        together with existing file type APIs.

        So that would make the API:

        req.read(2048) - reads up to 2048 bytes. If it reads less, it's at the end of the stream. Blocks
        until the requested amount has been read.
        req.read() - reads all data from the stream (bad idea), blocks until EOS.
        req.read(None) - reads the next chunk of data, blocks if no data available. Returns 0 size if at EOS.

        Mike Looijmans
        Philips Natlab / Topic Automation

        Show
        Mike Looijmans added a comment - For my tape project I used "None" as valid len argument to indicate reading the next available tape block, regardless of size (underlying layer figured out the tape block size). This worked well together with existing file type APIs. So that would make the API: req.read(2048) - reads up to 2048 bytes. If it reads less, it's at the end of the stream. Blocks until the requested amount has been read. req.read() - reads all data from the stream (bad idea), blocks until EOS. req.read(None) - reads the next chunk of data, blocks if no data available. Returns 0 size if at EOS. Mike Looijmans Philips Natlab / Topic Automation
        Hide
        Graham Dumpleton added a comment -

        Do note that changing the API to allow it will not help if the client doesn't used chunked encoding for request content. My understanding is that browsers may not have a way of saying it should be used and generally it requires a programming language level HTTP client, such as urllib2 or similar to be able to make use of it.

        Show
        Graham Dumpleton added a comment - Do note that changing the API to allow it will not help if the client doesn't used chunked encoding for request content. My understanding is that browsers may not have a way of saying it should be used and generally it requires a programming language level HTTP client, such as urllib2 or similar to be able to make use of it.
        Hide
        M Willson added a comment -

        If not possible in full generality then I'd be happy with a special separate read method for getting what is in the input buffer so far, with 'read' just blocking until the chunked upload is complete. Would this be possible?

        Or if we could specify a handler in python which is called back with each new chunk of data received, that would give the ultimate flexibility...

        Show
        M Willson added a comment - If not possible in full generality then I'd be happy with a special separate read method for getting what is in the input buffer so far, with 'read' just blocking until the chunked upload is complete. Would this be possible? Or if we could specify a handler in python which is called back with each new chunk of data received, that would give the ultimate flexibility...
        Hide
        M Willson added a comment -

        This would be useful, particularly for handling large PUT requests.

        Show
        M Willson added a comment - This would be useful, particularly for handling large PUT requests.

          People

          • Assignee:
            Unassigned
            Reporter:
            Graham Dumpleton
          • Votes:
            3 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

            • Created:
              Updated:

              Development