CouchDB
  1. CouchDB
  2. COUCHDB-1368

multipart/related document body doesn't identify which part is which attachment

    Details

    • Type: Bug Bug
    • Status: Resolved
    • Priority: Blocker Blocker
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 1.3.1, 1.4.0
    • Component/s: HTTP Interface
    • Labels:
      None

      Description

      If you GET a document with attachments in multipart/related format (by adding ?attachments=true and setting Accept:multipart/related), the MIME bodies for the attachments have no headers. This makes it difficult to tell which one is which. Damien says they're in the same order that they appear in the document's "_attachments" object ... which is fine if you're Erlang, because Erlang preserves the order of keys in a JSON object, but no other JSON implementation I know of does that (because they use hashtables instead of linked lists.)

      The upshot is that any non-Erlang code trying to parse such a response will have to do some by-hand parsing of the JSON data to get the _attachment keys in order.

      This can be fixed by adding a "Content-ID" header to each attachment body, whose value is the filename. It would be nice if other standard headers were added too, like "Content-Type", "Content-Length", "Content-Encoding", as this would make it work better with existing MIME multipart libraries.

      1. replication.log
        11 kB
        Christopher Bonhage

        Issue Links

          Activity

          Hide
          Dave Cottlehuber added a comment -

          +1 for 1.3.1; arguably the ability to hose a couch by PUTting a bunch of docs and calling the replicator can be considered a bug.

          Show
          Dave Cottlehuber added a comment - +1 for 1.3.1; arguably the ability to hose a couch by PUTting a bunch of docs and calling the replicator can be considered a bug.
          Hide
          Robert Newson added a comment -

          I'm on the fence about backporting this to 1.3.x (for 1.3.1) since it's a new feature. There's a bug (COUCHDB-1639) that reports replication errors between 1.3.0 and 1.3.0 that fails because of the absence of these headers but I've yet to reproduce it.

          Show
          Robert Newson added a comment - I'm on the fence about backporting this to 1.3.x (for 1.3.1) since it's a new feature. There's a bug ( COUCHDB-1639 ) that reports replication errors between 1.3.0 and 1.3.0 that fails because of the absence of these headers but I've yet to reproduce it.
          Hide
          Christopher Bonhage added a comment -

          Attached is the debug log from my 1.3.0 CouchDB pushing from database 'wtf' to database 'wtf2' over HTTP (127.0.0.1).

          The smaller docs are 'PUT' into wtf2 as multipart just fine. The 'oldui' doc has 96 attachments and goes through, but the 'newui' doc has 587 attachments and causes the error loop. From this point, the database starts to balloon in size until the couch is killed or disk space is exhausted.

          I can also confirm that this issue does not occur with the multipart header patches mentioned above are applied.

          Show
          Christopher Bonhage added a comment - Attached is the debug log from my 1.3.0 CouchDB pushing from database 'wtf' to database 'wtf2' over HTTP (127.0.0.1). The smaller docs are 'PUT' into wtf2 as multipart just fine. The 'oldui' doc has 96 attachments and goes through, but the 'newui' doc has 587 attachments and causes the error loop. From this point, the database starts to balloon in size until the couch is killed or disk space is exhausted. I can also confirm that this issue does not occur with the multipart header patches mentioned above are applied.
          Hide
          Robert Newson added a comment -

          There must be more to it. I just made a doc with 600 attachments (each a different length) and replicated it over http just fine. If you have more details, that will really help.

          Show
          Robert Newson added a comment - There must be more to it. I just made a doc with 600 attachments (each a different length) and replicated it over http just fine. If you have more details, that will really help.
          Hide
          Robert Newson added a comment -

          thanks.

          Show
          Robert Newson added a comment - thanks.
          Hide
          Christopher Bonhage added a comment - - edited

          Hi Robert, I most definitely can reproduce this between two 1.3.0 Couches.

          When attempting to push replicate a database that contains a document with 587 attachments of varying sizes my logs are filled with:

          [Tue, 07 May 2013 16:34:11 GMT] [error] [emulator] Error in process <0.3217.0> with exit value: {{badmatch,{[137,80,78,71],[]}},[{couch_httpd,split_header,1,[{file,"couch_httpd.erl"},{line,1014}]},{couch_httpd,'-parse_part_header/1-fun-1-',2,[{file,"couch_httpd.erl"},{line,1057}]},{lists,foldl,3,[{file,"lists.erl"},{line,1197}]},{couch_httpd,parse_part_header...
          
          Show
          Christopher Bonhage added a comment - - edited Hi Robert, I most definitely can reproduce this between two 1.3.0 Couches. When attempting to push replicate a database that contains a document with 587 attachments of varying sizes my logs are filled with: [Tue, 07 May 2013 16:34:11 GMT] [error] [emulator] Error in process <0.3217.0> with exit value: {{badmatch,{[137,80,78,71],[]}},[{couch_httpd,split_header,1,[{file,"couch_httpd.erl"},{line,1014}]},{couch_httpd,'-parse_part_header/1-fun-1-',2,[{file,"couch_httpd.erl"},{line,1057}]},{lists,foldl,3,[{file,"lists.erl"},{line,1197}]},{couch_httpd,parse_part_header...
          Hide
          Robert Newson added a comment -

          Hi Christopher,

          The work here is to help cooperate better with other tools. CouchDB itself should not be affected, since it doesn't reorder the JSON (and thus finds things in the correct order). Can you reliably induce this between two official 1.3.0 releases?

          Show
          Robert Newson added a comment - Hi Christopher, The work here is to help cooperate better with other tools. CouchDB itself should not be affected, since it doesn't reorder the JSON (and thus finds things in the correct order). Can you reliably induce this between two official 1.3.0 releases?
          Hide
          Christopher Bonhage added a comment -

          I just got burned by this on 1.3.0 when my replicator tried to send a doc with a large number of attachments as multipart. The receiving couch (also running 1.3.0) went into a barf loop trying to parse the non-existent headers of the multipart request.

          Considering how this breaks the replicator and causes CouchDB to fill the disk, this seems like more than a minor bug.

          I highly suggest merging the following commits into the next bugfix for 1.3.x to resolve this issue:
          a2b3cc72229b86805ad11a27e93e74a78d6bdfa6 Send attachment headers in multipart responses
          734ca24858c1fdcaa9d998f19bbf7f3e35817f64 fix multipart response mixup, fixes replication.js

          Show
          Christopher Bonhage added a comment - I just got burned by this on 1.3.0 when my replicator tried to send a doc with a large number of attachments as multipart. The receiving couch (also running 1.3.0) went into a barf loop trying to parse the non-existent headers of the multipart request. Considering how this breaks the replicator and causes CouchDB to fill the disk, this seems like more than a minor bug. I highly suggest merging the following commits into the next bugfix for 1.3.x to resolve this issue: a2b3cc72229b86805ad11a27e93e74a78d6bdfa6 Send attachment headers in multipart responses 734ca24858c1fdcaa9d998f19bbf7f3e35817f64 fix multipart response mixup, fixes replication.js
          Hide
          Jan Lehnardt added a comment -

          The branch got another update after review & collaboration with Robert Newson.

          Bottom line: that particular code path is fairly messy and is due for cleanup, but that would exceed the scope of this ticket.

          I’d suggest to commit this to master and if OK’d to 1.3.x and move on from there.

          Show
          Jan Lehnardt added a comment - The branch got another update after review & collaboration with Robert Newson. Bottom line: that particular code path is fairly messy and is due for cleanup, but that would exceed the scope of this ticket. I’d suggest to commit this to master and if OK’d to 1.3.x and move on from there.
          Hide
          Paul Joseph Davis added a comment -

          +1 on this. The only part I dislike is that length calculation, but given the current status of that function I think its the least bad way to implement this.

          Show
          Paul Joseph Davis added a comment - +1 on this. The only part I dislike is that length calculation, but given the current status of that function I think its the least bad way to implement this.
          Hide
          Jan Lehnardt added a comment -

          Jup, check out share/www/script/test/attachments_multipart.js

          Show
          Jan Lehnardt added a comment - Jup, check out share/www/script/test/attachments_multipart.js
          Hide
          Jens Alfke added a comment -

          You mean, add some JS unit tests? I think I could do that. Hopefully there are already some tests that look at MIME responses?

          Show
          Jens Alfke added a comment - You mean, add some JS unit tests? I think I could do that. Hopefully there are already some tests that look at MIME responses?
          Hide
          Jan Lehnardt added a comment -

          oopsn, I updated the branches accordingly. I only glanced at the spec for the format, not the header names, sorry about that!

          Either way though, I’d like a bit more thorough testing on this one, especially with all combinations of compressed, non compressed, binary and plain text attachments with compressed transfer encodings and without, just to make sure it is all correct. Is that something you can help with? (I know I asked that the last time and then nothing happened, but here we already have the fix, mostly, the other one turned to be a bit more hairy than I could handle with my time then

          Show
          Jan Lehnardt added a comment - oopsn, I updated the branches accordingly. I only glanced at the spec for the format, not the header names, sorry about that! Either way though, I’d like a bit more thorough testing on this one, especially with all combinations of compressed, non compressed, binary and plain text attachments with compressed transfer encodings and without, just to make sure it is all correct. Is that something you can help with? (I know I asked that the last time and then nothing happened, but here we already have the fix, mostly, the other one turned to be a bit more hairy than I could handle with my time then
          Hide
          Jens Alfke added a comment -

          It turns out "Content-ID" is not the correct header to use for the filename, because according to RFC2045 sec.7, "Content-ID values must be generated to be world-unique". (I didn't know this when writing up this issue, but discovered it later on while implementing MIME support for TouchDB. I should have updated this issue too; sorry!)

          The most appropriate header to use seems to be Content-Disposition (RFC1806):

          Content-Disposition: attachment; filename="test.txt"

          This is what TouchDB generates, and what it will recognize in incoming MIME documents.

          Show
          Jens Alfke added a comment - It turns out "Content-ID" is not the correct header to use for the filename, because according to RFC2045 sec.7, "Content-ID values must be generated to be world-unique". (I didn't know this when writing up this issue, but discovered it later on while implementing MIME support for TouchDB. I should have updated this issue too; sorry!) The most appropriate header to use seems to be Content-Disposition (RFC1806): Content-Disposition: attachment; filename="test.txt" This is what TouchDB generates, and what it will recognize in incoming MIME documents.
          Hide
          Jan Lehnardt added a comment -

          And fixed. I’ve updated the branches.

          Show
          Jan Lehnardt added a comment - And fixed. I’ve updated the branches.
          Hide
          Jan Lehnardt added a comment -

          The main CouchDB repo lives here: https://git-wip-us.apache.org/repos/asf?p=couchdb.git;a=summary

          Branches are not mirrored to GitHub at this point (sorry), but I pushed it to my fork:

          https://github.com/janl/couchdb/tree/1368-fix-multipart-header-parts.

          The commit in question:

          https://github.com/janl/couchdb/commit/18971de71c93c3a00e408b3d4eb67be8c695150c

          Here’s a request dump:

          > curl -Nv $COUCH/test/asd?attachments=true -H "Accept: multipart/related,/;"

          • About to connect() to 127.0.0.1 port 5984 (#0)
          • Trying 127.0.0.1...
          • connected
          • Connected to 127.0.0.1 (127.0.0.1) port 5984 (#0)
            > GET /test/asd?attachments=true HTTP/1.1
            > User-Agent: curl/7.24.0 (x86_64-apple-darwin12.0) libcurl/7.24.0 OpenSSL/0.9.8r zlib/1.2.5
            > Host: 127.0.0.1:5984
            > Accept: multipart/related,/;
            >
            < HTTP/1.1 200 OK
            < Server: CouchDB/1.3.0a-b9af7ea-git (Erlang OTP/R15B02)
            < ETag: "9-4310e4b1fcab6822344790d37fb5ddea"
            < Date: Wed, 14 Nov 2012 18:04:57 GMT
            < Content-Type: multipart/related; boundary="a38b2d614bb2a8d70e31050a0e2e11a5"
            < Content-Length: 493
            <
            --a38b2d614bb2a8d70e31050a0e2e11a5
            content-type: application/json

          {"_id":"asd","_rev":"9-4310e4b1fcab6822344790d37fb5ddea","foo":"var","_attachments":{"test.txt":

          {"content_type":"text/plain","revpos":8,"digest":"md5-7xbQv30HNBSgLpMDsQTH7A==","length":12,"follows":true,"encoding":"gzip","encoded_length":30}

          }}
          --a38b2d614bb2a8d70e31050a0e2e11a5
          Content-ID: test.txt
          Content-Type: text/plain
          Content-Length: 30
          Content-Transfer-Encoding: gzip

          K??WHJ,?*.?5?

          --a38b2

          The closing boundary is off, I seem to have a bug in the main request’s Content-Length calculation, but this is the direction this is going.

          Show
          Jan Lehnardt added a comment - The main CouchDB repo lives here: https://git-wip-us.apache.org/repos/asf?p=couchdb.git;a=summary Branches are not mirrored to GitHub at this point (sorry), but I pushed it to my fork: https://github.com/janl/couchdb/tree/1368-fix-multipart-header-parts . The commit in question: https://github.com/janl/couchdb/commit/18971de71c93c3a00e408b3d4eb67be8c695150c Here’s a request dump: > curl -Nv $COUCH/test/asd?attachments=true -H "Accept: multipart/related, / ;" About to connect() to 127.0.0.1 port 5984 (#0) Trying 127.0.0.1... connected Connected to 127.0.0.1 (127.0.0.1) port 5984 (#0) > GET /test/asd?attachments=true HTTP/1.1 > User-Agent: curl/7.24.0 (x86_64-apple-darwin12.0) libcurl/7.24.0 OpenSSL/0.9.8r zlib/1.2.5 > Host: 127.0.0.1:5984 > Accept: multipart/related, / ; > < HTTP/1.1 200 OK < Server: CouchDB/1.3.0a-b9af7ea-git (Erlang OTP/R15B02) < ETag: "9-4310e4b1fcab6822344790d37fb5ddea" < Date: Wed, 14 Nov 2012 18:04:57 GMT < Content-Type: multipart/related; boundary="a38b2d614bb2a8d70e31050a0e2e11a5" < Content-Length: 493 < --a38b2d614bb2a8d70e31050a0e2e11a5 content-type: application/json {"_id":"asd","_rev":"9-4310e4b1fcab6822344790d37fb5ddea","foo":"var","_attachments":{"test.txt": {"content_type":"text/plain","revpos":8,"digest":"md5-7xbQv30HNBSgLpMDsQTH7A==","length":12,"follows":true,"encoding":"gzip","encoded_length":30} }} --a38b2d614bb2a8d70e31050a0e2e11a5 Content-ID: test.txt Content-Type: text/plain Content-Length: 30 Content-Transfer-Encoding: gzip K??WHJ,?*.?5? --a38b2 The closing boundary is off, I seem to have a bug in the main request’s Content-Length calculation, but this is the direction this is going.
          Hide
          Jens Alfke added a comment -

          Where is the branch? I don't see it in the github UI at https://github.com/apache/couchdb .

          Also, could you post a sample of what the MIME headers look like for an attachment part?

          Show
          Jens Alfke added a comment - Where is the branch? I don't see it in the github UI at https://github.com/apache/couchdb . Also, could you post a sample of what the MIME headers look like for an attachment part?
          Hide
          Jan Lehnardt added a comment -

          Fixed in the branch: 1368-fix-multipart-header-parts

          I’d love a review.

          Show
          Jan Lehnardt added a comment - Fixed in the branch: 1368-fix-multipart-header-parts I’d love a review.

            People

            • Assignee:
              Unassigned
              Reporter:
              Jens Alfke
            • Votes:
              2 Vote for this issue
              Watchers:
              7 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development