Uploaded image for project: 'CouchDB'
  1. CouchDB
  2. COUCHDB-3168

Replicator doesn't handle well writing documents to a target db which has a small max_document_size

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: None
    • Component/s: None
    • Labels:
      None

      Description

      If a target db has set a smaller document max size, replication crashes.

      It might make sense for the replication to not crash and instead treat document size as an implicit replication filter then display doc write failures in the stats / task info / completion record of normal replications.

        Issue Links

          Activity

          Hide
          vatamane Nick Vatamaniuc added a comment -

          Initially this seemed like a one-line change:

          https://github.com/apache/couchdb-couch-replicator/blob/master/src/couch_replicator_api_wrap.erl#L451

          However a too large document crashes the whole _bulk_docs request it seems with:

          {"error":"too_large","reason":"the request entity is too large"}

          This mean we don't know which ones from the list of docs succeeded and which ones didn't.

          I tried this with:

          curl -X DELETE http://adm:pass@localhost:15984/x; curl -X PUT http://adm:pass@localhost:15984/x && curl -d @large_docs.json -H 'Content-Type: application/json' -X POST http://adm:pass@localhost:15984/x/_bulk_docs

          where large_docs.json looked something like

          {
              "docs" : [
                  {"_id" : "doc1"},
                  {"_id" : "doc2", "large":"x...."}
              ]
          }
          

          and max docs size was set to something smaller than the "large" value in the docs

          Show
          vatamane Nick Vatamaniuc added a comment - Initially this seemed like a one-line change: https://github.com/apache/couchdb-couch-replicator/blob/master/src/couch_replicator_api_wrap.erl#L451 However a too large document crashes the whole _bulk_docs request it seems with: {"error":"too_large","reason":"the request entity is too large"} This mean we don't know which ones from the list of docs succeeded and which ones didn't. I tried this with: curl -X DELETE http://adm:pass@localhost:15984/x ; curl -X PUT http://adm:pass@localhost:15984/x && curl -d @large_docs.json -H 'Content-Type: application/json' -X POST http://adm:pass@localhost:15984/x/_bulk_docs where large_docs.json looked something like { "docs" : [ { "_id" : "doc1" }, { "_id" : "doc2" , "large" : "x...." } ] } and max docs size was set to something smaller than the "large" value in the docs
          Hide
          vatamane Nick Vatamaniuc added a comment - - edited

          413 are emitted per request, generated from here:

          https://github.com/apache/couchdb-chttpd/blob/master/src/chttpd.erl#L607-L611

          So "max_document_size" is not strictly true as it is max_request_size really. Can still have documents smaller than that size just have many of them in a _bulk_docs request.

          There is a related ticket and associated PR:

          https://issues.apache.org/jira/browse/COUCHDB-2992

          Show
          vatamane Nick Vatamaniuc added a comment - - edited 413 are emitted per request, generated from here: https://github.com/apache/couchdb-chttpd/blob/master/src/chttpd.erl#L607-L611 So "max_document_size" is not strictly true as it is max_request_size really. Can still have documents smaller than that size just have many of them in a _bulk_docs request. There is a related ticket and associated PR: https://issues.apache.org/jira/browse/COUCHDB-2992
          Hide
          githubbot ASF GitHub Bot added a comment -

          GitHub user nickva opened a pull request:

          https://github.com/apache/couchdb-couch-replicator/pull/49

          Fix replicator handling of max_document_size when posting to _bulk_docs

          Currently `max_document_size` setting is a misnomer, it actually configures
          maximum request body size. For single document requests it is a good enough
          approximation. However, _bulk_docs updates could fail the total request size
          check even if individual documents stay below the maximum limit.

          Before this fix during replication, `_bulk_docs` reqeust would crash, which
          eventually leads to an infinite cycles of crashes and restarts (with a
          potential large state being dumped to logs), without replicaton job making
          progress.

          The is to do binary split on the batch size until either all documents will
          fit under max_document_size limit, or some documents will fail to replicate.

          If documents fail to replicate, they bump the `doc_write_failures` count.
          Effectively `max_document_size` acts as in implicit replication filter in this
          case.

          Jira: COUCHDB-3168

          You can merge this pull request into a Git repository by running:

          $ git pull https://github.com/cloudant/couchdb-couch-replicator couchdb-3168

          Alternatively you can review and apply these changes as the patch at:

          https://github.com/apache/couchdb-couch-replicator/pull/49.patch

          To close this pull request, make a commit to your master/trunk branch
          with (at least) the following in the commit message:

          This closes #49


          commit a9cd0b191524428ece0ebd0a1e18c88bb2afcbaa
          Author: Nick Vatamaniuc <vatamane@apache.org>
          Date: 2016-10-03T19:30:23Z

          Fix replicator handling of max_document_size when posting to _bulk_docs

          Currently `max_document_size` setting is a misnomer, it actually configures
          maximum request body size. For single document requests it is a good enough
          approximation. However, _bulk_docs updates could fail the total request size
          check even if individual documents stay below the maximum limit.

          Before this fix during replication, `_bulk_docs` reqeust would crash, which
          eventually leads to an infinite cycles of crashes and restarts (with a
          potential large state being dumped to logs), without replicaton job making
          progress.

          The is to do binary split on the batch size until either all documents will
          fit under max_document_size limit, or some documents will fail to replicate.

          If documents fail to replicate, they bump the `doc_write_failures` count.
          Effectively `max_document_size` acts as in implicit replication filter in this
          case.

          Jira: COUCHDB-3168


          Show
          githubbot ASF GitHub Bot added a comment - GitHub user nickva opened a pull request: https://github.com/apache/couchdb-couch-replicator/pull/49 Fix replicator handling of max_document_size when posting to _bulk_docs Currently `max_document_size` setting is a misnomer, it actually configures maximum request body size. For single document requests it is a good enough approximation. However, _bulk_docs updates could fail the total request size check even if individual documents stay below the maximum limit. Before this fix during replication, `_bulk_docs` reqeust would crash, which eventually leads to an infinite cycles of crashes and restarts (with a potential large state being dumped to logs), without replicaton job making progress. The is to do binary split on the batch size until either all documents will fit under max_document_size limit, or some documents will fail to replicate. If documents fail to replicate, they bump the `doc_write_failures` count. Effectively `max_document_size` acts as in implicit replication filter in this case. Jira: COUCHDB-3168 You can merge this pull request into a Git repository by running: $ git pull https://github.com/cloudant/couchdb-couch-replicator couchdb-3168 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/couchdb-couch-replicator/pull/49.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #49 commit a9cd0b191524428ece0ebd0a1e18c88bb2afcbaa Author: Nick Vatamaniuc <vatamane@apache.org> Date: 2016-10-03T19:30:23Z Fix replicator handling of max_document_size when posting to _bulk_docs Currently `max_document_size` setting is a misnomer, it actually configures maximum request body size. For single document requests it is a good enough approximation. However, _bulk_docs updates could fail the total request size check even if individual documents stay below the maximum limit. Before this fix during replication, `_bulk_docs` reqeust would crash, which eventually leads to an infinite cycles of crashes and restarts (with a potential large state being dumped to logs), without replicaton job making progress. The is to do binary split on the batch size until either all documents will fit under max_document_size limit, or some documents will fail to replicate. If documents fail to replicate, they bump the `doc_write_failures` count. Effectively `max_document_size` acts as in implicit replication filter in this case. Jira: COUCHDB-3168
          Hide
          jira-bot ASF subversion and git services added a comment -

          Commit 2f23b57cd705c87570d98340a4aad1bc611cd4f0 in couchdb-couch-replicator's branch refs/heads/master from Nick Vatamaniuc
          [ https://git-wip-us.apache.org/repos/asf?p=couchdb-couch-replicator.git;h=2f23b57 ]

          Fix replicator handling of max_document_size when posting to _bulk_docs

          Currently `max_document_size` setting is a misnomer, it actually configures
          maximum request body size. For single document requests it is a good enough
          approximation. However, _bulk_docs updates could fail the total request size
          check even if individual documents stay below the maximum limit.

          Before this fix during replication, `_bulk_docs` reqeust would crash, which
          eventually leads to an infinite cycles of crashes and restarts (with a
          potential large state being dumped to logs), without replicaton job making
          progress.

          The is to do binary split on the batch size until either all documents will
          fit under max_document_size limit, or some documents will fail to replicate.

          If documents fail to replicate, they bump the `doc_write_failures` count.
          Effectively `max_document_size` acts as in implicit replication filter in this
          case.

          Jira: COUCHDB-3168

          Show
          jira-bot ASF subversion and git services added a comment - Commit 2f23b57cd705c87570d98340a4aad1bc611cd4f0 in couchdb-couch-replicator's branch refs/heads/master from Nick Vatamaniuc [ https://git-wip-us.apache.org/repos/asf?p=couchdb-couch-replicator.git;h=2f23b57 ] Fix replicator handling of max_document_size when posting to _bulk_docs Currently `max_document_size` setting is a misnomer, it actually configures maximum request body size. For single document requests it is a good enough approximation. However, _bulk_docs updates could fail the total request size check even if individual documents stay below the maximum limit. Before this fix during replication, `_bulk_docs` reqeust would crash, which eventually leads to an infinite cycles of crashes and restarts (with a potential large state being dumped to logs), without replicaton job making progress. The is to do binary split on the batch size until either all documents will fit under max_document_size limit, or some documents will fail to replicate. If documents fail to replicate, they bump the `doc_write_failures` count. Effectively `max_document_size` acts as in implicit replication filter in this case. Jira: COUCHDB-3168
          Hide
          jira-bot ASF subversion and git services added a comment -

          Commit e5747dbaa2fb10760eb2cd3e289a01b51694c7cd in couchdb-couch-replicator's branch refs/heads/master from Nick Vatamaniuc
          [ https://git-wip-us.apache.org/repos/asf?p=couchdb-couch-replicator.git;h=e5747db ]

          Fix handling of 413 responses for single document PUT requests

          When replicator finds a document which has an attachment size greater than 64k,
          or has more than 8 attachments, it switches to a non-batching mode and posts
          each document separately using a PUT request with a multipart/related
          Content-Type.

          Explicitly handle the case when the response to the PUT request is a 413. Skip
          the document and dump `doc_write_failures` count, just like in the case of the
          413 response for a _bulk_docs POST request.

          Jira: COUCHDB-3168

          Show
          jira-bot ASF subversion and git services added a comment - Commit e5747dbaa2fb10760eb2cd3e289a01b51694c7cd in couchdb-couch-replicator's branch refs/heads/master from Nick Vatamaniuc [ https://git-wip-us.apache.org/repos/asf?p=couchdb-couch-replicator.git;h=e5747db ] Fix handling of 413 responses for single document PUT requests When replicator finds a document which has an attachment size greater than 64k, or has more than 8 attachments, it switches to a non-batching mode and posts each document separately using a PUT request with a multipart/related Content-Type. Explicitly handle the case when the response to the PUT request is a 413. Skip the document and dump `doc_write_failures` count, just like in the case of the 413 response for a _bulk_docs POST request. Jira: COUCHDB-3168
          Hide
          jira-bot ASF subversion and git services added a comment -

          Commit 93c4ceaf97f46e0dd0fcc1deffe966263eda67d3 in couchdb-couch-replicator's branch refs/heads/master from Nick Vatamaniuc
          [ https://git-wip-us.apache.org/repos/asf?p=couchdb-couch-replicator.git;h=93c4cea ]

          Add tests which check small values of max_document_size setting on the target

          A low max_document_size setting on the target will interact with the replicator,
          this commit adds a few tests to check that interaction.

          There are 3 test scenarios:

          • A basic test checks that individual document sizes can be smaller than
            max_document_size yet, when batched together by the replicator they exceed,
            the maximum size. Replicator in that case should split document batches into
            halves down to individual documents, such that the replication should succeed.
          • one_large_one_small test checks that a large single document should be
            skipped such that it doesn't end on the target and it doesn't crash the
            replication job (so the small document should reach the target).
          • The third test is currently disable because of COUCHDB-3174. Once that
            issue is fixed, it will test a corner case in replicator when it
            switches from using batches and POST-ing to _bulk_docs to using individual
            PUT's with multipart/mixed Content-Type. Those PUT request can also return
            413 error code, so this tests it explicitly.

          Jira: COUCHDB-3168

          Show
          jira-bot ASF subversion and git services added a comment - Commit 93c4ceaf97f46e0dd0fcc1deffe966263eda67d3 in couchdb-couch-replicator's branch refs/heads/master from Nick Vatamaniuc [ https://git-wip-us.apache.org/repos/asf?p=couchdb-couch-replicator.git;h=93c4cea ] Add tests which check small values of max_document_size setting on the target A low max_document_size setting on the target will interact with the replicator, this commit adds a few tests to check that interaction. There are 3 test scenarios: A basic test checks that individual document sizes can be smaller than max_document_size yet, when batched together by the replicator they exceed, the maximum size. Replicator in that case should split document batches into halves down to individual documents, such that the replication should succeed. one_large_one_small test checks that a large single document should be skipped such that it doesn't end on the target and it doesn't crash the replication job (so the small document should reach the target). The third test is currently disable because of COUCHDB-3174 . Once that issue is fixed, it will test a corner case in replicator when it switches from using batches and POST-ing to _bulk_docs to using individual PUT's with multipart/mixed Content-Type. Those PUT request can also return 413 error code, so this tests it explicitly. Jira: COUCHDB-3168
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user asfgit closed the pull request at:

          https://github.com/apache/couchdb-couch-replicator/pull/49

          Show
          githubbot ASF GitHub Bot added a comment - Github user asfgit closed the pull request at: https://github.com/apache/couchdb-couch-replicator/pull/49
          Hide
          jira-bot ASF subversion and git services added a comment -

          Commit 92fa3b11ebb3d109eb711a02972adbbf0468c2a1 in couchdb's branch refs/heads/master from Nick Vatamaniuc
          [ https://git-wip-us.apache.org/repos/asf?p=couchdb.git;h=92fa3b1 ]

          Replicator bump. Add 413 response handling for replicator.

          Jira: COUCHDB-3168

          Show
          jira-bot ASF subversion and git services added a comment - Commit 92fa3b11ebb3d109eb711a02972adbbf0468c2a1 in couchdb's branch refs/heads/master from Nick Vatamaniuc [ https://git-wip-us.apache.org/repos/asf?p=couchdb.git;h=92fa3b1 ] Replicator bump. Add 413 response handling for replicator. Jira: COUCHDB-3168

            People

            • Assignee:
              Unassigned
              Reporter:
              vatamane Nick Vatamaniuc
            • Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development