CouchDB
  1. CouchDB
  2. COUCHDB-1174

multipart/related doc PUT API can respond too soon and discard parts of subsequent requests in the connection

    Details

    • Type: Bug Bug
    • Status: Resolved
    • Priority: Blocker Blocker
    • Resolution: Fixed
    • Affects Version/s: 1.0.2, 1.1, 1.2
    • Fix Version/s: 1.0.3, 1.1, 1.2
    • Component/s: None
    • Labels:
      None

      Description

      It seems the new multipart savvy replicator has a bug. At high load, the receiving node sees the following as the method of a new http request;

      "-17481297448f5a282cc919203957ebd9-POST"

      instead of just "POST". The first bit looks like a multipart boundary value to me.

      I'll attach a script that reproduces the error now.

      1. couchdb-1174_2.patch
        3 kB
        Filipe Manana
      2. couchdb-1174.patch
        2 kB
        Filipe Manana
      3. COUCHDB-1174.sh
        0.5 kB
        Robert Newson

        Activity

        Hide
        Filipe Manana added a comment -

        Applied to trunk, 1.1.x and 1.0.x

        Show
        Filipe Manana added a comment - Applied to trunk, 1.1.x and 1.0.x
        Hide
        Robert Newson added a comment -

        Fix looks good! test script no longer induces the method_not_allowed problem. I applied just couchdb-1174_2.patch to latest trunk.

        Show
        Robert Newson added a comment - Fix looks good! test script no longer induces the method_not_allowed problem. I applied just couchdb-1174_2.patch to latest trunk.
        Hide
        Filipe Manana added a comment -

        There's even another issue, besides replying before the parser is finished:

        The data function does a socket recv call with 0 as the length - this means "give me all the data in socket". I've now seen, very rarely, that with this approach the multipart parser can steal some data of a subsequent request in the connection, which is simply silently discard after.
        The solution here is to use the multipart request's content-lentgh and keep track of how much was already received before each data function call.

        Show
        Filipe Manana added a comment - There's even another issue, besides replying before the parser is finished: The data function does a socket recv call with 0 as the length - this means "give me all the data in socket". I've now seen, very rarely, that with this approach the multipart parser can steal some data of a subsequent request in the connection, which is simply silently discard after. The solution here is to use the multipart request's content-lentgh and keep track of how much was already received before each data function call.
        Hide
        Filipe Manana added a comment -

        I was finally able to reproduce this on an old macbook.

        It turns out the issue is that for very short multipart requests, we don't wait for the parser to recv the final boundary marker before sending an OK reply to the client, so the next request gets the boundary prepended to the HTTP verb.

        Before committing it, I leave here the patch for Robert to test.
        Robert, try to test it after revision 1126147, which is important.

        thanks everyone

        Show
        Filipe Manana added a comment - I was finally able to reproduce this on an old macbook. It turns out the issue is that for very short multipart requests, we don't wait for the parser to recv the final boundary marker before sending an OK reply to the client, so the next request gets the boundary prepended to the HTTP verb. Before committing it, I leave here the patch for Robert to test. Robert, try to test it after revision 1126147, which is important. thanks everyone
        Hide
        Benoit Chesneau added a comment - - edited

        can't reproduce either on a relatively fast machine (last mba full options):

        Darwin enki.local 10.7.0 Darwin Kernel Version 10.7.0: Sat Jan 29 15:17:16 PST 2011; root:xnu-1504.9.37~1/RELEASE_I386 i386

        $ sysctl -a |grep hw
        hw.machine = i386
        hw.model = MacBookAir3,2
        hw.ncpu = 2
        hw.byteorder = 1234
        hw.physmem = 2147483648
        hw.usermem = 1761689600
        hw.pagesize = 4096
        hw.epoch = 0
        hw.vectorunit = 1
        hw.busfrequency = 1064000000
        hw.cpufrequency = 2130000000
        hw.cachelinesize = 64
        hw.l1icachesize = 32768
        hw.l1dcachesize = 32768
        hw.l2settings = 1
        hw.l2cachesize = 6291456
        hw.tbfrequency = 1000000000
        hw.memsize = 4294967296
        hw.availcpu = 2
        net.link.ether.inet.apple_hwcksum_tx: 1
        net.link.ether.inet.apple_hwcksum_rx: 1
        hw.ncpu: 2
        hw.byteorder: 1234
        hw.memsize: 4294967296
        hw.activecpu: 2
        hw.physicalcpu: 2
        hw.physicalcpu_max: 2
        hw.logicalcpu: 2
        hw.logicalcpu_max: 2
        hw.cputype: 7
        hw.cpusubtype: 4
        hw.cpu64bit_capable: 1
        hw.cpufamily: 2028621756
        hw.cacheconfig: 2 1 2 0 0 0 0 0 0 0
        hw.cachesize: 4026531840 32768 6291456 0 0 0 0 0 0 0
        hw.pagesize: 4096
        hw.busfrequency: 1064000000
        hw.busfrequency_min: 1064000000
        hw.busfrequency_max: 1064000000
        hw.cpufrequency: 2130000000
        hw.cpufrequency_min: 2130000000
        hw.cpufrequency_max: 2130000000
        hw.cachelinesize: 64
        hw.l1icachesize: 32768
        hw.l1dcachesize: 32768
        hw.l2cachesize: 6291456
        hw.tbfrequency: 1000000000
        hw.packages: 1
        hw.optional.floatingpoint: 1
        hw.optional.mmx: 1
        hw.optional.sse: 1
        hw.optional.sse2: 1
        hw.optional.sse3: 1
        hw.optional.supplementalsse3: 1
        hw.optional.sse4_1: 1
        hw.optional.sse4_2: 0
        hw.optional.x86_64: 1
        hw.optional.aes: 0
        machdep.pmap.hashwalks: 6458414

        Show
        Benoit Chesneau added a comment - - edited can't reproduce either on a relatively fast machine (last mba full options): Darwin enki.local 10.7.0 Darwin Kernel Version 10.7.0: Sat Jan 29 15:17:16 PST 2011; root:xnu-1504.9.37~1/RELEASE_I386 i386 $ sysctl -a |grep hw hw.machine = i386 hw.model = MacBookAir3,2 hw.ncpu = 2 hw.byteorder = 1234 hw.physmem = 2147483648 hw.usermem = 1761689600 hw.pagesize = 4096 hw.epoch = 0 hw.vectorunit = 1 hw.busfrequency = 1064000000 hw.cpufrequency = 2130000000 hw.cachelinesize = 64 hw.l1icachesize = 32768 hw.l1dcachesize = 32768 hw.l2settings = 1 hw.l2cachesize = 6291456 hw.tbfrequency = 1000000000 hw.memsize = 4294967296 hw.availcpu = 2 net.link.ether.inet.apple_hwcksum_tx: 1 net.link.ether.inet.apple_hwcksum_rx: 1 hw.ncpu: 2 hw.byteorder: 1234 hw.memsize: 4294967296 hw.activecpu: 2 hw.physicalcpu: 2 hw.physicalcpu_max: 2 hw.logicalcpu: 2 hw.logicalcpu_max: 2 hw.cputype: 7 hw.cpusubtype: 4 hw.cpu64bit_capable: 1 hw.cpufamily: 2028621756 hw.cacheconfig: 2 1 2 0 0 0 0 0 0 0 hw.cachesize: 4026531840 32768 6291456 0 0 0 0 0 0 0 hw.pagesize: 4096 hw.busfrequency: 1064000000 hw.busfrequency_min: 1064000000 hw.busfrequency_max: 1064000000 hw.cpufrequency: 2130000000 hw.cpufrequency_min: 2130000000 hw.cpufrequency_max: 2130000000 hw.cachelinesize: 64 hw.l1icachesize: 32768 hw.l1dcachesize: 32768 hw.l2cachesize: 6291456 hw.tbfrequency: 1000000000 hw.packages: 1 hw.optional.floatingpoint: 1 hw.optional.mmx: 1 hw.optional.sse: 1 hw.optional.sse2: 1 hw.optional.sse3: 1 hw.optional.supplementalsse3: 1 hw.optional.sse4_1: 1 hw.optional.sse4_2: 0 hw.optional.x86_64: 1 hw.optional.aes: 0 machdep.pmap.hashwalks: 6458414
        Hide
        Filipe Manana added a comment -

        I've run your script multiple times and I can't reproduce it. I think your machine faster and therefore it's why you're able to have this issue.
        Doing an analysis, I see 3 possible causes:

        1) When sending multipart requests to the target, we're sending an extra ending boundary (-some_uuid-) for some cases; For this script, all the documents/attachments are exactly the same, so if there were such a bug, it would be triggered for any document (except for the first one sent);

        2) On the receiving side, the multipart parser finishes successfully before receiving the ending boundary marker for this particular case, and so the ending boundary will be part of the beginning of the next request that goes in the same connection;

        3) Mochiweb sets the SO_REUSEADDR in the listen socket. Sockets resulting from the 'accept' call, inherit the same options as the listening socket. It might be that a socket is reused and it has some garbage left from a previous request.

        For 1) and 2), since I can't reproduce this issue on 2 different machines, I've only done code analysis, mostly in couch_doc (multipart sender) and couch_httpd (multipart parser) and so far I can't find anything wrong with them.

        Robert, do you think you can try the following mochiweb patch:

        diff --git a/src/mochiweb/mochiweb_socket_server.erl b/src/mochiweb/mochiweb_socket_server.erl
        index ff0d8f3..4c605a0 100644
        — a/src/mochiweb/mochiweb_socket_server.erl
        +++ b/src/mochiweb/mochiweb_socket_server.erl
        @@ -148,7 +148,7 @@ ipv6_supported() ->
        init(State=#mochiweb_socket_server

        {ip=Ip, port=Port, backlog=Backlog, nodelay=NoDelay}

        ) ->
        process_flag(trap_exit, true),
        BaseOpts = [binary,

        • {reuseaddr, true}

          ,
          +

          {reuseaddr, false}

          ,

          {packet, 0}

          ,

          {backlog, Backlog}

          ,

          {recbuf, ?RECBUF_SIZE}

          ,

        Adding the nodelay option to the replication socket_options might reduce the frequency of this issue. Just add an extra field to replication objects:

        "socket_options": "[

        {nodelay, true}

        ,

        {keepalive, true}

        ]"

        Anyway, these are mostly speculations, but I believe the reuseaddr option has a small chance of being a problem here.

        cheers

        Show
        Filipe Manana added a comment - I've run your script multiple times and I can't reproduce it. I think your machine faster and therefore it's why you're able to have this issue. Doing an analysis, I see 3 possible causes: 1) When sending multipart requests to the target, we're sending an extra ending boundary (- some_uuid -) for some cases; For this script, all the documents/attachments are exactly the same, so if there were such a bug, it would be triggered for any document (except for the first one sent); 2) On the receiving side, the multipart parser finishes successfully before receiving the ending boundary marker for this particular case, and so the ending boundary will be part of the beginning of the next request that goes in the same connection; 3) Mochiweb sets the SO_REUSEADDR in the listen socket. Sockets resulting from the 'accept' call, inherit the same options as the listening socket. It might be that a socket is reused and it has some garbage left from a previous request. For 1) and 2), since I can't reproduce this issue on 2 different machines, I've only done code analysis, mostly in couch_doc (multipart sender) and couch_httpd (multipart parser) and so far I can't find anything wrong with them. Robert, do you think you can try the following mochiweb patch: diff --git a/src/mochiweb/mochiweb_socket_server.erl b/src/mochiweb/mochiweb_socket_server.erl index ff0d8f3..4c605a0 100644 — a/src/mochiweb/mochiweb_socket_server.erl +++ b/src/mochiweb/mochiweb_socket_server.erl @@ -148,7 +148,7 @@ ipv6_supported() -> init(State=#mochiweb_socket_server {ip=Ip, port=Port, backlog=Backlog, nodelay=NoDelay} ) -> process_flag(trap_exit, true), BaseOpts = [binary, {reuseaddr, true} , + {reuseaddr, false} , {packet, 0} , {backlog, Backlog} , {recbuf, ?RECBUF_SIZE} , Adding the nodelay option to the replication socket_options might reduce the frequency of this issue. Just add an extra field to replication objects: "socket_options": "[ {nodelay, true} , {keepalive, true} ]" Anyway, these are mostly speculations, but I believe the reuseaddr option has a small chance of being a problem here. cheers
        Hide
        Robert Newson added a comment -

        Filipe speculates its an unterminated chunked transfer encoding; it's missing the final 0 byte chunk for eof. The end of multipart marker is present but runs directly into the next request.

        Show
        Robert Newson added a comment - Filipe speculates its an unterminated chunked transfer encoding; it's missing the final 0 byte chunk for eof. The end of multipart marker is present but runs directly into the next request.
        Hide
        Robert Newson added a comment -

        This script

        • induces method_not_allowed bug on trunk
        • induces a function_clause in write_streamed_attachment in 1.0.2 and 1.0.x
        Show
        Robert Newson added a comment - This script induces method_not_allowed bug on trunk induces a function_clause in write_streamed_attachment in 1.0.2 and 1.0.x

          People

          • Assignee:
            Unassigned
            Reporter:
            Robert Newson
          • Votes:
            0 Vote for this issue
            Watchers:
            0 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development