Details

    • Type: Bug
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 2.0.0
    • Component/s: HTTP Interface
    • Labels:
      None

      Description

      CouchDB replication is too slow.

      And what makes it so slow is that it's just so unnecessarily chatty. During replication, you have to do a separate GET for each individual document, in order to get the full _revisions object for that document (using the revs and open_revs parameters – refer to the TouchDB writeup or Benoit's writeup if you need a refresher).

      So for example, let's say you've got a database full of 10,000 documents, and you replicate using a batch size of 500 (batch sizes are configurable in PouchDB). The conversation for a single batch basically looks like this:

      - REPLICATOR: gimme 500 changes since seq X (1 GET request)
        - SOURCE: okay
      - REPLICATOR: gimme the _revs_diff for these 500 docs/_revs (1 POST request)
        - SOURCE: okay
      - repeat 500 times:
        - REPLICATOR: gimme the _revisions for doc n with _revs [...] (1 GET request)
          - SOURCE: okay
      - REPLICATOR: here's a _bulk_docs with 500 documents (1 POST request)
          - TARGET: okay
      

      See the problem here? That 500-loop, where we have to do a GET for each one of 500 documents, is a lot of unnecessary back-and-forth, considering that the replicator already knows what it needs before the loop starts. You can parallelize, but if you assume a browser (e.g. for PouchDB), most browsers only let you do ~8 simultaneous requests at once. Plus, there's latency and HTTP headers to consider. So overall, it's not cool.

      So why do we even need to do the separate requests? Shouldn't _all_docs be good enough? Turns out it's not, because we need this special _revisions object.

      For example, consider a document 'foo' with 10 revisions. You may compact the database, in which case revisions 1-x through 9-x are no longer retrievable. However, if you query using revs and open_revs, those rev IDs are still available:

      $ curl 'http://nolan.iriscouch.com/test/foo?revs=true&open_revs=all'
      
      {
        "_id": "foo",
        "_rev": "10-c78e199ad5e996b240c9d6482907088e",
        "_revisions": {
          "start": 10,
          "ids": [
            "c78e199ad5e996b240c9d6482907088e",
            "f560283f1968a05046f0c38e468006bb",
            "0091198554171c632c27c8342ddec5af",
            "e0a023e2ea59db73f812ad773ea08b17",
            "65d7f8b8206a244035edd9f252f206ad",
            "069d1432a003c58bdd23f01ff80b718f",
            "d21f26bb604b7fe9eba03ce4562cf37b",
            "31d380f99a6e54875855e1c24469622d",
            "3b4791360024426eadafe31542a2c34b",
            "967a00dff5e02add41819138abb3284d"
          ]
        }
      }
      

      And in the replication algorithm, this full _revisions object is required at the point when you copy the document from one database to another, which is accomplished with a POST to _bulk_docs using new_edits=false. If you don't have the full _revisions object, CouchDB accepts the new revision, but considers it to be a conflict. (The exception is with generation-1 documents, since they have no history, so as it says in the TouchDB writeup, you can safely just use _all_docs as an optimization for such documents.)

      And unfortunately, this _revision object is only available from the GET /:dbid/:docid endpoint. Trust me; I've tried the other APIs. You can't get it anywhere else.

      This is a huge problem, especially in PouchDB where we often have to deal with CORS, meaning the number of HTTP requests is doubled. So for those 500 GETs, it's an extra 500 OPTIONs, which is just unacceptable.

      Replication does not have to be slow. While we were experimenting with ways of fetching documents in bulk, we tried a technique that just relied on using _changes with include_docs=true (#2472). This pushed conflicts into the target database, but on the upside, you can sync ~95k documents from npm's skimdb repository to the browser in less than 20 minutes! (See npm-browser.com for a demo.)

      What an amazing story we could tell about the beauty of CouchDB replication, if only this trick actually worked!

      My proposal is a simple one: just add the revs and open_revs options to _all_docs. Presumably this would be aligned with keys, so similar to how keys takes an array of docIds, open_revs would take an array of array of revisions. revs would just be a boolean.

      This only gets hairy in the case of deleted documents. In this example, bar is deleted but foo is not:

      curl -g 'http://nolan.iriscouch.com/test/_all_docs?keys=["bar","foo"]&include_docs=true'
      
      {"total_rows":1,"offset":0,"rows":[
      {"id":"bar","key":"bar","value":{"rev":"2-eec205a9d413992850a6e32678485900","deleted":true},"doc":null},
      {"id":"foo","key":"foo","value":{"rev":"10-c78e199ad5e996b240c9d6482907088e"},"doc":{"_id":"foo","_rev":"10-c78e199ad5e996b240c9d6482907088e"}}
      ]}
      

      The cleanest would be to attach the _revisions object to the doc, but if you use keys, then the deleted documents are returned with doc: null, even if you specify include_docs=true. One workaround would be to simply add a revisions object to the value.

      If all of this would be too difficult to implement under the hood in CouchDB, I'd also be happy to get the _revisions back in _changes, _revs_diff, or even in a separate endpoint. I don't care, as long as there is some bulk API where I can get multiple _revisions for multiple documents at once.

      On the PouchDB end of things, we would really like to push forward on this. I'm happy to implement a Node.js proxy to stand in front of CouchDB/Cloudant/CSG and add this new API, plus adding it directly to PouchDB Server. I can invent whatever API I want, but the main thing is that I would like this API to be something that all the major players can agree upon (Apache, Cloudant, Couchbase) so that eventually the proxy is no longer necessary.

      Thanks for reading the WoT. Looking forward to a faster CouchDB replication protocol, since it's the thing that ties us all together and makes this crazy experiment worthwhile.

      Background: this and this.

        Activity

        Hide
        rnewson Robert Newson added a comment - - edited

        Great writeup!

        My first thought is to enhance the POST form of /dbname/_all_docs. Currently it expects

        {"keys": []}

        where the keys are doc _id's. This is because _all_docs apes the view API.

        Here's my suggestion;

        {"docs": [ {"id": "foo","open_revs": ["1-foo", "2-bar"]}, {"id":"bar" ... } ]}
        

        The response will return the named document with all the specified open_revs in the order of the "docs" array. Each row will be a separate chunk, the server will not buffer the full response.

        Deciding on a API is the hard part, I don't think the plumbing will be all that tricky.

        Show
        rnewson Robert Newson added a comment - - edited Great writeup! My first thought is to enhance the POST form of /dbname/_all_docs. Currently it expects {"keys": []} where the keys are doc _id's. This is because _all_docs apes the view API. Here's my suggestion; { "docs" : [ { "id" : "foo" , "open_revs" : [ "1-foo" , "2-bar" ]}, { "id" : "bar" ... } ]} The response will return the named document with all the specified open_revs in the order of the "docs" array. Each row will be a separate chunk, the server will not buffer the full response. Deciding on a API is the hard part, I don't think the plumbing will be all that tricky.
        Hide
        nolanlawson Nolan Lawson added a comment -

        Thanks! That looks like a great solution, and the fact that the response can be chunked is definitely a win.

        The only thing that's kinda unfortunate is that it seems the current version of CouchDB will accept a POST to _all_docs with anything in it other than keys, and it will just respond as if you hadn't included any options at all. What would be ideal is if this new API threw an error in older versions, so that we could use that on the client side for feature detection.

        It looks like if we include a non-array in keys, it will throw a

        {"error":"bad_request","reason":"`keys` member must be a array."}
        

        Can we use that to our advantage, or do you think it's just to ugly to put non-keys into keys?

        Show
        nolanlawson Nolan Lawson added a comment - Thanks! That looks like a great solution, and the fact that the response can be chunked is definitely a win. The only thing that's kinda unfortunate is that it seems the current version of CouchDB will accept a POST to _all_docs with anything in it other than keys , and it will just respond as if you hadn't included any options at all. What would be ideal is if this new API threw an error in older versions, so that we could use that on the client side for feature detection. It looks like if we include a non-array in keys , it will throw a { "error" : "bad_request" , "reason" : "`keys` member must be a array." } Can we use that to our advantage, or do you think it's just to ugly to put non-keys into keys ?
        Hide
        rnewson Robert Newson added a comment -

        We'll need an array so that we can ensure that docs are returned in the same order as the request. While CouchDB preserves the order of keys in an object when marshalling to and from JSON, other libraries don't (and they are obviously not required to).

        We could arrange that "docs" is tested first, so you could get feature detection by posting;

        {"docs": [ whatever ], "keys": {}}
        

        Versions of CouchDB that don't look for "docs" will crash with that 400. Cheesy but acceptable?

        Show
        rnewson Robert Newson added a comment - We'll need an array so that we can ensure that docs are returned in the same order as the request. While CouchDB preserves the order of keys in an object when marshalling to and from JSON, other libraries don't (and they are obviously not required to). We could arrange that "docs" is tested first, so you could get feature detection by posting; { "docs" : [ whatever ], "keys" : {}} Versions of CouchDB that don't look for "docs" will crash with that 400. Cheesy but acceptable?
        Hide
        janl Jan Lehnardt added a comment -

        (bikeshed) or we make it a new endpoint /_bulk_revs (or something) that we can build up as we like.

        Show
        janl Jan Lehnardt added a comment - (bikeshed) or we make it a new endpoint /_bulk_revs (or something) that we can build up as we like.
        Hide
        nolanlawson Nolan Lawson added a comment - - edited

        Robert Newson It's cheesy, but it's definitely doable.

        I'm kind of partial to Jan's idea though. One benefit of inventing a new endpoint is that it could also do the job of _revs_diff, i.e. for all the missing revs, it would also just give us the documents/attachments/_revisions corresponding to those revs. Right now it's kind of silly that we post to _revs_diff to get a list of revs, and then we turn right back around and send those same revs to another endpoint to get back the corresponding documents.

        Show
        nolanlawson Nolan Lawson added a comment - - edited Robert Newson It's cheesy, but it's definitely doable. I'm kind of partial to Jan's idea though. One benefit of inventing a new endpoint is that it could also do the job of _revs_diff , i.e. for all the missing revs, it would also just give us the documents/attachments/ _revisions corresponding to those revs. Right now it's kind of silly that we post to _revs_diff to get a list of revs, and then we turn right back around and send those same revs to another endpoint to get back the corresponding documents.
        Hide
        rnewson Robert Newson added a comment -

        Agreed, a new endpoint and clean 404 response for detection.

        Show
        rnewson Robert Newson added a comment - Agreed, a new endpoint and clean 404 response for detection.
        Hide
        dholth Daniel Holth added a comment -

        Proposed nginx implementation: https://gist.github.com/dholth/62ccd920be0a80a51443

        The code runs inside nginx with ngx_lua and requires lua-cjson. POST the json

        {"docs":["id1", "id2", .. "idn"]}

        to /db/dbname/_new_endpoint; receive an array of one json GET response per id. Should be suitable for streaming parsing of the response as well.

        Show
        dholth Daniel Holth added a comment - Proposed nginx implementation: https://gist.github.com/dholth/62ccd920be0a80a51443 The code runs inside nginx with ngx_lua and requires lua-cjson. POST the json {"docs":["id1", "id2", .. "idn"]} to /db/dbname/_new_endpoint; receive an array of one json GET response per id. Should be suitable for streaming parsing of the response as well.
        Hide
        kxepal Alexander Shorin added a comment -

        Daniel Holth missed latest=true parameter, open_revs doesn't accepts all, but a list of revisions which were found missed on previous _revs_diff call. Also this solution doesn't respects attachments which you would happy to handle via multipart response instead of json (due to base64 encoding and big json objects as a result).

        Show
        kxepal Alexander Shorin added a comment - Daniel Holth missed latest=true parameter, open_revs doesn't accepts all, but a list of revisions which were found missed on previous _revs_diff call. Also this solution doesn't respects attachments which you would happy to handle via multipart response instead of json (due to base64 encoding and big json objects as a result).
        Hide
        dholth Daniel Holth added a comment - - edited

        Would you suggest modeling the endpoint after _revs_diff, sending all the json documents, and then sending multipart attachments? (Or just sending each multipart GET response as a larger multipart response instead of using JSON?) I notice the json document GET does send the attachment stubs.

        Show
        dholth Daniel Holth added a comment - - edited Would you suggest modeling the endpoint after _revs_diff, sending all the json documents, and then sending multipart attachments? (Or just sending each multipart GET response as a larger multipart response instead of using JSON?) I notice the json document GET does send the attachment stubs.
        Hide
        kxepal Alexander Shorin added a comment -

        > I notice the json document GET does send the attachment stubs.

        As for "Accept: application/json" request against doc with open_revs parameter they will be only sent if you explicitly ask for that (via additional query param attachents=true). As for "Accept: multipart/*" they will be always returned back.

        I would suggest to :
        1) Let /db/_bulk_docs accept multipart requests
        2) Let the new endpoint (/db/_bulk_revs) return multipart response as like as /db/docid?open_revs=[...] does

        So the replication turns into the easy walk: the client have to transparently stream response from _bulk_revs as the request body for _bulk_docs without buffering anything in memory. May be only tracking the latest received doc from _bulk_revs to easily recover replication on connection failure. Server in case of loosing connection with the client may flush received data to db and wait for the recover. Also this way could be easily upgraded for using modern streaming protocols like websockets etc.

        Currently, replcation client have to maintain local buffer of received docs and flush them when the stack limit get reached.

        Show
        kxepal Alexander Shorin added a comment - > I notice the json document GET does send the attachment stubs. As for "Accept: application/json" request against doc with open_revs parameter they will be only sent if you explicitly ask for that (via additional query param attachents=true). As for "Accept: multipart/*" they will be always returned back. I would suggest to : 1) Let /db/_bulk_docs accept multipart requests 2) Let the new endpoint (/db/_bulk_revs) return multipart response as like as /db/docid?open_revs= [...] does So the replication turns into the easy walk: the client have to transparently stream response from _bulk_revs as the request body for _bulk_docs without buffering anything in memory. May be only tracking the latest received doc from _bulk_revs to easily recover replication on connection failure. Server in case of loosing connection with the client may flush received data to db and wait for the recover. Also this way could be easily upgraded for using modern streaming protocols like websockets etc. Currently, replcation client have to maintain local buffer of received docs and flush them when the stack limit get reached.
        Hide
        dholth Daniel Holth added a comment - - edited

        By the way It's not that difficult to process json responses in a streaming way, processing each element in an array as it is received. http://oboejs.com/ is a JavaScript implementation of the feature.

        I don't make heavy use of attachments, it doesn't affect me personally if replicating attachments requires additional GETs.

        Does this part of the replication API benefit from more sophisticated streaming APIs? Regular HTTP would seem to be sufficient for the bulk case that we need to speed up.

        Show
        dholth Daniel Holth added a comment - - edited By the way It's not that difficult to process json responses in a streaming way, processing each element in an array as it is received. http://oboejs.com/ is a JavaScript implementation of the feature. I don't make heavy use of attachments, it doesn't affect me personally if replicating attachments requires additional GETs. Does this part of the replication API benefit from more sophisticated streaming APIs? Regular HTTP would seem to be sufficient for the bulk case that we need to speed up.
        Hide
        kxepal Alexander Shorin added a comment -

        > Does this part of the replication API benefit from more sophisticated streaming APIs? Regular HTTP would seem to be sufficient for the bulk case that we need to speed up.

        That would be still masking the problem instead of solving it. Currently, if document has any attachment and it's large then few dozen kilobytes, CouchDB will not sent this document against bulk_doc endpoint, but will send it alone. So, if you'll try to replicate something like npm registry, all documents will be replicated one by one. Benefit from bulk rev fetch will not be so good, but definitely will harm the performance since you'll be need much more RAM to handle all the docs with their attachments.

        Show
        kxepal Alexander Shorin added a comment - > Does this part of the replication API benefit from more sophisticated streaming APIs? Regular HTTP would seem to be sufficient for the bulk case that we need to speed up. That would be still masking the problem instead of solving it. Currently, if document has any attachment and it's large then few dozen kilobytes, CouchDB will not sent this document against bulk_doc endpoint, but will send it alone. So, if you'll try to replicate something like npm registry, all documents will be replicated one by one. Benefit from bulk rev fetch will not be so good, but definitely will harm the performance since you'll be need much more RAM to handle all the docs with their attachments.
        Hide
        dholth Daniel Holth added a comment -

        IIUC npm doesn't use CouchDB attachments anymore. https://skimdb.npmjs.com/registry .

        There is no need to store or buffer the entire response of many documents or attachments. It can be streamed on both ends. This is independent of whether JSON or multipart/mime is being used.

        Show
        dholth Daniel Holth added a comment - IIUC npm doesn't use CouchDB attachments anymore. https://skimdb.npmjs.com/registry . There is no need to store or buffer the entire response of many documents or attachments. It can be streamed on both ends. This is independent of whether JSON or multipart/mime is being used.
        Hide
        kxepal Alexander Shorin added a comment -

        Daniel Holth I know that npm had changed their design, but others may still use CouchDB attachments actively. We cannot just ignore them.

        > There is no need to store or buffer the entire response of many documents or attachments. It can be streamed on both ends. This is independent of whether JSON or multipart/mime is being used.

        You have to, because JSON is not streaming format. You may use some agreements like "line-based protocol" as changes feed is, or "one chunk per item" as views does, but all of them are workarounds for JSON. Multipart is purely streaming format - no agreements on implementation required. Also, you cannot stream JSON -> Multipart or Multipart -> JSON since they have different requirements for attachments stubs fields ("follows": true).

        Show
        kxepal Alexander Shorin added a comment - Daniel Holth I know that npm had changed their design, but others may still use CouchDB attachments actively. We cannot just ignore them. > There is no need to store or buffer the entire response of many documents or attachments. It can be streamed on both ends. This is independent of whether JSON or multipart/mime is being used. You have to, because JSON is not streaming format. You may use some agreements like "line-based protocol" as changes feed is, or "one chunk per item" as views does, but all of them are workarounds for JSON. Multipart is purely streaming format - no agreements on implementation required. Also, you cannot stream JSON -> Multipart or Multipart -> JSON since they have different requirements for attachments stubs fields ("follows": true).
        Hide
        rnewson Robert Newson added a comment -

        I hope we're not derailed here. Obviously the resolution of this ticket will be an erlang code change and not an external nginx script.

        Show
        rnewson Robert Newson added a comment - I hope we're not derailed here. Obviously the resolution of this ticket will be an erlang code change and not an external nginx script.
        Hide
        dholth Daniel Holth added a comment - - edited

        It should be clear that the lua script is just a prototype or shim.

        Next idea: _bulk_get. The _bulk_get API accepts all the parameters of GET /

        {db}

        /

        {docid}

        as GET parameters and as members of an array of objects in the request body, saving space for common parameters:

        ( Python: )
        requests.post('http://localhost/db/pouch/_bulk_get?meta=true&open_revs="all"&revs_info=true', data=json.dumps({'docs':[

        {'id':'foo'}

        ,

        {'id':'bar'}

        ,

        {'id':'quux', 'meta':False}

        ]}))

        For each object in the request body's 'docs' member, GET the 'id' in that member. Copy all the GET parameters passed to _bulk_get to the subrequest. Additional parameters in each item in the 'docs' array override the shared GET parameters. So in the example foo and bar are fetched with meta=true while quux is fetched with meta=false.

        Return a JSON array of the [internal] GET requests performed.

        https://gist.github.com/dholth/62ccd920be0a80a51443

        By being built out of existing CouchDB APIs, _bulk_get would solve exactly the "individual GET requests are slow" problem without introducing other new features.

        Much of this thread is about whether this kind of API can or should have a multipart/mime form for attachments... they are not important to me.

        Show
        dholth Daniel Holth added a comment - - edited It should be clear that the lua script is just a prototype or shim. Next idea: _bulk_get. The _bulk_get API accepts all the parameters of GET / {db} / {docid} as GET parameters and as members of an array of objects in the request body, saving space for common parameters: ( Python: ) requests.post('http://localhost/db/pouch/_bulk_get?meta=true&open_revs="all"&revs_info=true', data=json.dumps({'docs':[ {'id':'foo'} , {'id':'bar'} , {'id':'quux', 'meta':False} ]})) For each object in the request body's 'docs' member, GET the 'id' in that member. Copy all the GET parameters passed to _bulk_get to the subrequest. Additional parameters in each item in the 'docs' array override the shared GET parameters. So in the example foo and bar are fetched with meta=true while quux is fetched with meta=false. Return a JSON array of the [internal] GET requests performed. https://gist.github.com/dholth/62ccd920be0a80a51443 By being built out of existing CouchDB APIs, _bulk_get would solve exactly the "individual GET requests are slow" problem without introducing other new features. Much of this thread is about whether this kind of API can or should have a multipart/mime form for attachments... they are not important to me.
        Hide
        cwmma Calvin Metcalf added a comment -

        getting back on track, is there a reason we couldn't (or shouldn't) add an options to changes that gives everything you need to do replication (maybe sans full attachments or with full attachments optional). If we are trying to make replication speedier that would be the biggest win even if you could only do it when replicating into an empty database as that would help with the extremely common initial database population situation.

        Show
        cwmma Calvin Metcalf added a comment - getting back on track, is there a reason we couldn't (or shouldn't) add an options to changes that gives everything you need to do replication (maybe sans full attachments or with full attachments optional). If we are trying to make replication speedier that would be the biggest win even if you could only do it when replicating into an empty database as that would help with the extremely common initial database population situation.
        Hide
        nolanlawson Nolan Lawson added a comment -

        For the short term, I'm on Calvin's side. I think we can solve this by just turning the CouchDB replication algorithm into a simple stream. I've started an initial version here: pouchdb-replication-stream. If it works, it could be very exciting.

        For the long term, I'd still like to see a new API endpoint that simply fixes the immediate problem of GET with open_revs/revs being too slow. Ideally it should be a solution that can be easily integrated into existing replication algorithms – not just PouchDB's, but also TouchDB, Cloudant Mobile, CSG, Cloudant, etc.

        Show
        nolanlawson Nolan Lawson added a comment - For the short term, I'm on Calvin's side. I think we can solve this by just turning the CouchDB replication algorithm into a simple stream. I've started an initial version here: pouchdb-replication-stream . If it works, it could be very exciting. For the long term, I'd still like to see a new API endpoint that simply fixes the immediate problem of GET with open_revs / revs being too slow. Ideally it should be a solution that can be easily integrated into existing replication algorithms – not just PouchDB's, but also TouchDB, Cloudant Mobile, CSG, Cloudant, etc.
        Hide
        dholth Daniel Holth added a comment -

        I've been able to do some additional work on this and am able to release under the Apache 2.0 license. It is at https://github.com/dholth/pouchdb/compare/fast-enough

        It contains the previously mentioned lua shim adding a _bulk_get API to CouchDB, the necessary nginx reverse proxy config and initial support for the feature in PouchDB. POST an array of GET parameters as the json request body to _bulk_get, many GET subrequests are made, and the results are concatenated and returned as a json array.

        I only chose CouchDB because I thought it could correctly replicate to PouchDB in a reasonable amount of time. For a database of 1639 documents, none rev-1, PouchDB's current replication algorithm makes 1,718 requests to the server. After the change it takes 98 requests to do the same thing. Now replication is fast enough to be useful.

        I might have produced a patch to CouchDB itself but I do not know Erlang and I already need to run CouchDB behind nginx for authentication reasons.

        Show
        dholth Daniel Holth added a comment - I've been able to do some additional work on this and am able to release under the Apache 2.0 license. It is at https://github.com/dholth/pouchdb/compare/fast-enough It contains the previously mentioned lua shim adding a _bulk_get API to CouchDB, the necessary nginx reverse proxy config and initial support for the feature in PouchDB. POST an array of GET parameters as the json request body to _bulk_get, many GET subrequests are made, and the results are concatenated and returned as a json array. I only chose CouchDB because I thought it could correctly replicate to PouchDB in a reasonable amount of time. For a database of 1639 documents, none rev-1, PouchDB's current replication algorithm makes 1,718 requests to the server. After the change it takes 98 requests to do the same thing. Now replication is fast enough to be useful. I might have produced a patch to CouchDB itself but I do not know Erlang and I already need to run CouchDB behind nginx for authentication reasons.
        Hide
        janl Jan Lehnardt added a comment -

        Heya Daniel, cool work thanks!

        While we can’t use the lua implementation in CouchDB obviously, it is a great prototype for any work we can do on the Erlang side.

        FWIW, this would make a great first Erlang/CouchDB project, if you feel so inclined — Or anyone, for that matter!

        Show
        janl Jan Lehnardt added a comment - Heya Daniel, cool work thanks! While we can’t use the lua implementation in CouchDB obviously, it is a great prototype for any work we can do on the Erlang side. FWIW, this would make a great first Erlang/CouchDB project, if you feel so inclined — Or anyone, for that matter!
        Hide
        dholth Daniel Holth added a comment -

        Here is the WORKING Erlang prototype. https://github.com/dholth/couchdb/compare/1.6.x?expand=1

        As you can see the JSON options parser simply had to match binaries instead of lists to work properly compared to the HTTP query string options parser. This contains exactly what I need to make my PouchDB replication batch_size times faster combined with a simple plugin on the PouchDB side: https://github.com/dholth/pouchdb-bulk-get . It may not work as expected unless all the options PouchDB sends are present in the request, but that is acceptable for my use case.

        Show
        dholth Daniel Holth added a comment - Here is the WORKING Erlang prototype. https://github.com/dholth/couchdb/compare/1.6.x?expand=1 As you can see the JSON options parser simply had to match binaries instead of lists to work properly compared to the HTTP query string options parser. This contains exactly what I need to make my PouchDB replication batch_size times faster combined with a simple plugin on the PouchDB side: https://github.com/dholth/pouchdb-bulk-get . It may not work as expected unless all the options PouchDB sends are present in the request, but that is acceptable for my use case.
        Hide
        githubbot ASF GitHub Bot added a comment -

        GitHub user dholth opened a pull request:

        https://github.com/apache/couchdb-couch/pull/18

        rebased _bulk_get patch

        This is a rebase for https://issues.apache.org/jira/browse/COUCHDB-2310 against post-1.6.x CouchDB.

        The idea is that when it is finished every GET option should eventually be supported in the _bulk_get API, making CouchDB easier to learn than a hypothetical _bulk_get API supporting only a subtly different subset of the GET API.

        Is it acceptable to refactor the GET handling?

        Is anyone willing to help get this polished accepted into CouchDB?

        I wrote this patch because without it CouchDB replication to PouchDB is too slow to be useful. After this patch, formerly impractical replications that would take ~10,000 requests now take a pleasantly acceptable ~100 requests.

        https://github.com/dholth/pouchdb-bulk-get is a PouchDB plugin for the other end.

        Thanks.

        You can merge this pull request into a Git repository by running:

        $ git pull https://github.com/dholth/couchdb-couch bulk-get

        Alternatively you can review and apply these changes as the patch at:

        https://github.com/apache/couchdb-couch/pull/18.patch

        To close this pull request, make a commit to your master/trunk branch
        with (at least) the following in the commit message:

        This closes #18


        commit 516dc25a3be9d0c7458c5a2a826930821366b7e5
        Author: Daniel Holth <dholth@fastmail.fm>
        Date: 2014-12-03T02:28:03Z

        rebased _bulk_get patch


        Show
        githubbot ASF GitHub Bot added a comment - GitHub user dholth opened a pull request: https://github.com/apache/couchdb-couch/pull/18 rebased _bulk_get patch This is a rebase for https://issues.apache.org/jira/browse/COUCHDB-2310 against post-1.6.x CouchDB. The idea is that when it is finished every GET option should eventually be supported in the _bulk_get API, making CouchDB easier to learn than a hypothetical _bulk_get API supporting only a subtly different subset of the GET API. Is it acceptable to refactor the GET handling? Is anyone willing to help get this polished accepted into CouchDB? I wrote this patch because without it CouchDB replication to PouchDB is too slow to be useful. After this patch, formerly impractical replications that would take ~10,000 requests now take a pleasantly acceptable ~100 requests. https://github.com/dholth/pouchdb-bulk-get is a PouchDB plugin for the other end. Thanks. You can merge this pull request into a Git repository by running: $ git pull https://github.com/dholth/couchdb-couch bulk-get Alternatively you can review and apply these changes as the patch at: https://github.com/apache/couchdb-couch/pull/18.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #18 commit 516dc25a3be9d0c7458c5a2a826930821366b7e5 Author: Daniel Holth <dholth@fastmail.fm> Date: 2014-12-03T02:28:03Z rebased _bulk_get patch
        Hide
        githubbot ASF GitHub Bot added a comment -

        Github user nolanlawson commented on the pull request:

        https://github.com/apache/couchdb-couch/pull/18#issuecomment-67534359

        @benoitc This is news to me; I don't think anyone even mentioned it in COUCHDB-2310.

        If there's an existing standard that CBLite and rcouch are using, then I'd definitely prefer to implement that one for PouchDB.

        > It is actually compatible with couchdb 1.6

        Not seeing it in my own Couch 1.6.0, maybe I'm doing something wrong?

        ```bash
        curl -X POST localhost:5984/test/_bulk_get?revs=true -H 'Content-type:application/json' -d '

        {docs: []}

        '
        ```

        I get "Referer header required," which seems to be the same error if you try to post to a non-existing underscore-starting path (e.g. `/test/_foobar`).

        Show
        githubbot ASF GitHub Bot added a comment - Github user nolanlawson commented on the pull request: https://github.com/apache/couchdb-couch/pull/18#issuecomment-67534359 @benoitc This is news to me; I don't think anyone even mentioned it in COUCHDB-2310 . If there's an existing standard that CBLite and rcouch are using, then I'd definitely prefer to implement that one for PouchDB. > It is actually compatible with couchdb 1.6 Not seeing it in my own Couch 1.6.0, maybe I'm doing something wrong? ```bash curl -X POST localhost:5984/test/_bulk_get?revs=true -H 'Content-type:application/json' -d ' {docs: []} ' ``` I get "Referer header required," which seems to be the same error if you try to post to a non-existing underscore-starting path (e.g. `/test/_foobar`).
        Hide
        githubbot ASF GitHub Bot added a comment -

        Github user nolanlawson commented on the pull request:

        https://github.com/apache/couchdb-couch/pull/18#issuecomment-67548296

        @rnewson OK, but do we all agree that @benoitc's spec is the one to go with? It seems like it meets all the requirements we discussed in COUCHDB-2310.

        Unfortunately @dholth that would mean a bit of a rewrite, since [your spec](https://issues.apache.org/jira/browse/COUCHDB-2310?focusedCommentId=14122248&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14122248) is similar but not exactly the same. What do you think?

        Show
        githubbot ASF GitHub Bot added a comment - Github user nolanlawson commented on the pull request: https://github.com/apache/couchdb-couch/pull/18#issuecomment-67548296 @rnewson OK, but do we all agree that @benoitc's spec is the one to go with? It seems like it meets all the requirements we discussed in COUCHDB-2310 . Unfortunately @dholth that would mean a bit of a rewrite, since [your spec] ( https://issues.apache.org/jira/browse/COUCHDB-2310?focusedCommentId=14122248&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14122248 ) is similar but not exactly the same. What do you think?
        Hide
        rnewson Robert Newson added a comment -

        I can't say that I agree, no. _bulk_get is not a good name, it doesn't tell you what you're getting. We earlier proposed /_bulk_revs which at least hints at what you're getting back (aka a whole bunch of document revisions).

        It's a shame we couldn't see a way to extend the existing bulk get API (POST /_all_docs), having two seems awkward in comparison. I appreciate that we raised and discussed some compatibility issues earlier.

        Show
        rnewson Robert Newson added a comment - I can't say that I agree, no. _bulk_get is not a good name, it doesn't tell you what you're getting. We earlier proposed /_bulk_revs which at least hints at what you're getting back (aka a whole bunch of document revisions). It's a shame we couldn't see a way to extend the existing bulk get API (POST /_all_docs), having two seems awkward in comparison. I appreciate that we raised and discussed some compatibility issues earlier.
        Hide
        rnewson Robert Newson added a comment -

        finally, the intent to make everything accessible in bulk using POST's seems to ruin our RESTful nature. Is there another way to pursue performance enhancements without going that far? I personally hate all the bulk endpoints (each added pretty much ad-hoc for much the same reason motivating this ticket).

        Show
        rnewson Robert Newson added a comment - finally, the intent to make everything accessible in bulk using POST's seems to ruin our RESTful nature. Is there another way to pursue performance enhancements without going that far? I personally hate all the bulk endpoints (each added pretty much ad-hoc for much the same reason motivating this ticket).
        Hide
        kxepal Alexander Shorin added a comment -

        I agree with Robert Newson about the name and bulk evilness. Having this feature for /_all_docs would be much more better.

        Is there another way to pursue performance enhancements without going that far?

        There is a tricky one by using /db/_changes?include_docs=true&style=all_docs&attachments=true which will let it emit documents with all leaf/conflicted revisions. But here we have other problems:

        • we'll have always read all the leaves every time
        • no multipart support

        So reducing amount of requests will cost us a lot of more traffic to receive and memory usage. Implement multipart response type support isn't a big problem I think - this mimetype is made for streams. For other problem we could make some smart style which will emit all the leaves only once while after (for continuous feeds) will emit only new/updated ones. In the result replicator will not need to operate with /_revs_diff and document resources and just listen changes feed and fetch all the data from there.

        Show
        kxepal Alexander Shorin added a comment - I agree with Robert Newson about the name and bulk evilness. Having this feature for /_all_docs would be much more better. Is there another way to pursue performance enhancements without going that far? There is a tricky one by using /db/_changes?include_docs=true&style=all_docs&attachments=true which will let it emit documents with all leaf/conflicted revisions. But here we have other problems: we'll have always read all the leaves every time no multipart support So reducing amount of requests will cost us a lot of more traffic to receive and memory usage. Implement multipart response type support isn't a big problem I think - this mimetype is made for streams. For other problem we could make some smart style which will emit all the leaves only once while after (for continuous feeds) will emit only new/updated ones. In the result replicator will not need to operate with /_revs_diff and document resources and just listen changes feed and fetch all the data from there.
        Hide
        nolanlawson Nolan Lawson added a comment - - edited

        Having this feature for /_all_docs would be much more better.

        I tend to disagree with you and Robert Newson; it would be much easier to feature-test if it was a separate endpoint. I.e. the client just tries to POST and checks for an error.

        There is a tricky one by using /db/_changes?include_docs=true&style=all_docs&attachments=true

        I'm pretty sure this "tricky" scenario only works for gen-1 docs. Hence my writeup above about how we tried to be too clever in PouchDB and ultimately had to roll it back. Basically you need the _revisions object, or else sad things will happen, and only GET /doc gives you that.

        At this point I would be +1 on moving forward with Benoit's thing for CouchDB 1.x at least. Jan Lehnardt, would you be on board with that?

        I personally hate all the bulk endpoints

        I hate the names too. But if rcouch and CBLite have already implemented it, then it kinda feels like a fait accompli. And to Benoit's credit, the _bulk_get API seems really simple and easy to implement; it would be hard for any us to mess it up.

        Show
        nolanlawson Nolan Lawson added a comment - - edited Having this feature for /_all_docs would be much more better. I tend to disagree with you and Robert Newson ; it would be much easier to feature-test if it was a separate endpoint. I.e. the client just tries to POST and checks for an error. There is a tricky one by using /db/_changes?include_docs=true&style=all_docs&attachments=true I'm pretty sure this "tricky" scenario only works for gen-1 docs. Hence my writeup above about how we tried to be too clever in PouchDB and ultimately had to roll it back. Basically you need the _revisions object, or else sad things will happen, and only GET /doc gives you that. At this point I would be +1 on moving forward with Benoit's thing for CouchDB 1.x at least. Jan Lehnardt , would you be on board with that? I personally hate all the bulk endpoints I hate the names too. But if rcouch and CBLite have already implemented it, then it kinda feels like a fait accompli. And to Benoit's credit, the _bulk_get API seems really simple and easy to implement; it would be hard for any us to mess it up.
        Hide
        janl Jan Lehnardt added a comment -

        How about this for an option:

        1. We include the rcouch code in 1.7.0 as _bulk_get.
        2. We may make a chttpd version in 2.0.
        3. We mark it/them as deprecated from the start.
        4. We design a desirable replication stream for 2.x and forward together with the PouchDB and TouchDB folks. Roughly, this would be a multipart stream that mirrors /_changes + include_docs + open_revs (plus handwaving, bear with me). Something like /db/_replication_stream or so.

        We can’t get around POST APIs for potentially large key-requests as per real world HTTP constraints. Maybe HTTP2 with its multiplexing helps? I don’t know. We should also definitely support the GET version of these requests if the client knows that the amount of data it has to upload to get the right response is within any practical limits for the given setup.

        I’d say though, for the discussion of this ticket, that is out of scope.

        Show
        janl Jan Lehnardt added a comment - How about this for an option: 1. We include the rcouch code in 1.7.0 as _bulk_get. 2. We may make a chttpd version in 2.0. 3. We mark it/them as deprecated from the start. 4. We design a desirable replication stream for 2.x and forward together with the PouchDB and TouchDB folks. Roughly, this would be a multipart stream that mirrors /_changes + include_docs + open_revs (plus handwaving, bear with me). Something like /db/_replication_stream or so. We can’t get around POST APIs for potentially large key-requests as per real world HTTP constraints. Maybe HTTP2 with its multiplexing helps? I don’t know. We should also definitely support the GET version of these requests if the client knows that the amount of data it has to upload to get the right response is within any practical limits for the given setup. I’d say though, for the discussion of this ticket, that is out of scope.
        Hide
        rnewson Robert Newson added a comment - - edited

        A couple of comments (and this is getting off-topicy);

        1) the proprietary extensions in couchdb-related projects can be an excellent guide for couchdb itself but we're not beholden to them. In fact, I suggest we're obliged to consider the "for the ages" aspect when considering incorporating them.
        2) Agree that enhancing _all_docs is tricky given the poor request handling of the past (specifically, not return a 400 Bad Request when given unexpected input).
        3) Adding a new feature in 1.7 that we don't necessarily intend to keep in 2.0 is a terrible idea, even if marked experimental.
        4) _bulk_get remains a poor name. _bulk_revs is better, even if the code is the same as _bulk_get.
        5) whether a 1.7 release happens is controversial. I think it should not happen, it's a significant effort and slows 2.0 release even further.

        In summary, I suggest we add _bulk_revs with the rcouch code assuming it passes muster (formatting, tests, etc) on couchdb standards. And it should be added to the couchdb-couch master and couchdb-chttpd with a backport to a 1.x branch of top-level couchdb if (and only if) someone is prepared to make 1.7 happen (my 5 implies that I will not exert personal effort to make that happen, but I'm not going to stop others if they wish to spend their time, unless they would otherwise had exerted effort to make the important 2.0 release happen sooner).

        summary of summary: new feature work occurs on master, backported if appropriate.

        Show
        rnewson Robert Newson added a comment - - edited A couple of comments (and this is getting off-topicy); 1) the proprietary extensions in couchdb-related projects can be an excellent guide for couchdb itself but we're not beholden to them. In fact, I suggest we're obliged to consider the "for the ages" aspect when considering incorporating them. 2) Agree that enhancing _all_docs is tricky given the poor request handling of the past (specifically, not return a 400 Bad Request when given unexpected input). 3) Adding a new feature in 1.7 that we don't necessarily intend to keep in 2.0 is a terrible idea, even if marked experimental. 4) _bulk_get remains a poor name. _bulk_revs is better, even if the code is the same as _bulk_get. 5) whether a 1.7 release happens is controversial. I think it should not happen, it's a significant effort and slows 2.0 release even further. In summary, I suggest we add _bulk_revs with the rcouch code assuming it passes muster (formatting, tests, etc) on couchdb standards. And it should be added to the couchdb-couch master and couchdb-chttpd with a backport to a 1.x branch of top-level couchdb if (and only if) someone is prepared to make 1.7 happen (my 5 implies that I will not exert personal effort to make that happen, but I'm not going to stop others if they wish to spend their time, unless they would otherwise had exerted effort to make the important 2.0 release happen sooner). summary of summary: new feature work occurs on master, backported if appropriate.
        Hide
        rnewson Robert Newson added a comment -

        as an addendum, we could support _bulk_get as (deprecated) alias to _bulk_revs and remove it the version after. And I agree that the API of _bulk_get looks good to me, though I note that the rendering is somewhat broken.

        Show
        rnewson Robert Newson added a comment - as an addendum, we could support _bulk_get as (deprecated) alias to _bulk_revs and remove it the version after. And I agree that the API of _bulk_get looks good to me, though I note that the rendering is somewhat broken.
        Hide
        rnewson Robert Newson added a comment -

        and noting that we don't have a good way to indicate deprecated features except in documentation (which people read only if they encounter a problem). Another reason why I'm down on 1.7 (which, last I heard, was going to somehow help people transition to 2.0 by deprecating features, but no good mechanism was devised for that to my knowledge).

        Show
        rnewson Robert Newson added a comment - and noting that we don't have a good way to indicate deprecated features except in documentation (which people read only if they encounter a problem). Another reason why I'm down on 1.7 (which, last I heard, was going to somehow help people transition to 2.0 by deprecating features, but no good mechanism was devised for that to my knowledge).
        Hide
        dholth Daniel Holth added a comment -

        Thank you for being interested in _bulk_get.

        The RESTfulness of the existing GET API is not more useful than having an efficient API, and JSON parameters are consistent and nicer to emit than query strings.

        I run my patched CouchDB behind a SPDY or HTTP/2 proxy; SPDY actually does allow the browser to make many, many GET requests at a time, perhaps 10,000 in a couple of seconds, and speeds up PouchDB replication. Doesn't help everyone. So many tiny GET requests would almost certainly be less efficient than a proper bulk API.

        Please preserve the JSON support of the new API as present in rcouch. It is easier and in some applications totally appropriate to use a streaming (or ordinary) JSON parser than multipart/mime. CouchDB emits both in a streaming fashion anyway.

        Show
        dholth Daniel Holth added a comment - Thank you for being interested in _bulk_get. The RESTfulness of the existing GET API is not more useful than having an efficient API, and JSON parameters are consistent and nicer to emit than query strings. I run my patched CouchDB behind a SPDY or HTTP/2 proxy; SPDY actually does allow the browser to make many, many GET requests at a time, perhaps 10,000 in a couple of seconds, and speeds up PouchDB replication. Doesn't help everyone. So many tiny GET requests would almost certainly be less efficient than a proper bulk API. Please preserve the JSON support of the new API as present in rcouch. It is easier and in some applications totally appropriate to use a streaming (or ordinary) JSON parser than multipart/mime. CouchDB emits both in a streaming fashion anyway.
        Hide
        benoitc Benoit Chesneau added a comment -

        Sorry, I didn't see that jira was updated apart of github.

        What I could do is opening a branch adding _bulk_get to 1.6.1 and 2.x branches since I have both now. It could be done next week (i am traveling a lot this week). As for the name _bulk_get or _bulk_revs I don't really care. The reasoning behind the choice of that name was that it allowed us to work peacefully with couchbase lite which was the initial goal.

        Since it's easy to change the entry point name (at least in couchdb 1.x) it doesn't matter anyway. So can be _bulk_revs or _buk_get or both as you wish. Just let me know

        Show
        benoitc Benoit Chesneau added a comment - Sorry, I didn't see that jira was updated apart of github. What I could do is opening a branch adding _bulk_get to 1.6.1 and 2.x branches since I have both now. It could be done next week (i am traveling a lot this week). As for the name _bulk_get or _bulk_revs I don't really care. The reasoning behind the choice of that name was that it allowed us to work peacefully with couchbase lite which was the initial goal. Since it's easy to change the entry point name (at least in couchdb 1.x) it doesn't matter anyway. So can be _bulk_revs or _buk_get or both as you wish. Just let me know
        Hide
        janl Jan Lehnardt added a comment -

        Benoit Chesneau Just to make sure I read this correctly: You have a 2.x compatible _bulk_get implementation? That’d be awesome!

        I’m sure we can sort out the rest.

        Show
        janl Jan Lehnardt added a comment - Benoit Chesneau Just to make sure I read this correctly: You have a 2.x compatible _bulk_get implementation? That’d be awesome! I’m sure we can sort out the rest.
        Hide
        benoitc Benoit Chesneau added a comment -

        Jan Lehnardt correct

        Show
        benoitc Benoit Chesneau added a comment - Jan Lehnardt correct
        Hide
        nolanlawson Nolan Lawson added a comment -

        For PouchDB Server, we can expose both _bulk_revs and _bulk_get as synonymous endpoints, no prob. Issue filed.

        On the PouchDB client side, we can just check both endpoints in order to feature-detect.

        Show
        nolanlawson Nolan Lawson added a comment - For PouchDB Server, we can expose both _bulk_revs and _bulk_get as synonymous endpoints, no prob. Issue filed . On the PouchDB client side, we can just check both endpoints in order to feature-detect.
        Hide
        janl Jan Lehnardt added a comment -

        Benoit Chesneau any news?

        Show
        janl Jan Lehnardt added a comment - Benoit Chesneau any news?
        Hide
        benoitc Benoit Chesneau added a comment -

        sorry i have been horribly busy last week. But I will make sure to make the patch available until friday

        Show
        benoitc Benoit Chesneau added a comment - sorry i have been horribly busy last week. But I will make sure to make the patch available until friday
        Hide
        janl Jan Lehnardt added a comment -

        Cool, thanks, no worries — Also feel free to post incomplete stuff, happy to pick it up.

        Show
        janl Jan Lehnardt added a comment - Cool, thanks, no worries — Also feel free to post incomplete stuff, happy to pick it up.
        Hide
        janl Jan Lehnardt added a comment -

        nudge

        Show
        janl Jan Lehnardt added a comment - nudge
        Hide
        janl Jan Lehnardt added a comment -

        Ok, trying to sum up this discussion. I’m aiming for least amount of work to get something tangible going.

        1. Let us only consider 2.0 and beyond.
        2. Let us only focus on an intermediate way to make PouchDB, Couchbase Mobile etc. replication faster. A fully streaming API is out of scope for this ticket (although we should work on it, once this lands).
        3. I propose to use the existing `_bulk_get` spec from Couchbase Sync Server / Mobile / rcouch (Benoit Chesneau if you do have that CouchDB 2.0-compatible patch for _bulk_get, it’d be nice to see now, even if not 100% ready).
        4. As Robert Newson notes, we add this as `_bulk_revs` with an immediately deprecated alias `_bulk_get`. The two benefits here are easy feature detection (_bulk_revs/_get = 404 on older CouchDB versions) and immediate compatibility with existing versions of the other replicating stores.
        5. PouchDB will have to be updated to use the `_bulk_get` endpoint. Daniel Holth’s work, as far as I can tell, would only need minor adjustments. Would PouchDB accept such a patch into core (Nolan Lawson).

        Does this work for everyone?

        Show
        janl Jan Lehnardt added a comment - Ok, trying to sum up this discussion. I’m aiming for least amount of work to get something tangible going. 1. Let us only consider 2.0 and beyond. 2. Let us only focus on an intermediate way to make PouchDB, Couchbase Mobile etc. replication faster. A fully streaming API is out of scope for this ticket (although we should work on it, once this lands). 3. I propose to use the existing `_bulk_get` spec from Couchbase Sync Server / Mobile / rcouch ( Benoit Chesneau if you do have that CouchDB 2.0-compatible patch for _bulk_get, it’d be nice to see now, even if not 100% ready). 4. As Robert Newson notes, we add this as `_bulk_revs` with an immediately deprecated alias `_bulk_get`. The two benefits here are easy feature detection (_bulk_revs/_get = 404 on older CouchDB versions) and immediate compatibility with existing versions of the other replicating stores. 5. PouchDB will have to be updated to use the `_bulk_get` endpoint. Daniel Holth ’s work, as far as I can tell, would only need minor adjustments. Would PouchDB accept such a patch into core ( Nolan Lawson ). Does this work for everyone?
        Hide
        nolanlawson Nolan Lawson added a comment -

        Works for me.

        Show
        nolanlawson Nolan Lawson added a comment - Works for me.
        Hide
        kxepal Alexander Shorin added a comment -

        Jan Lehnardt Is there a reason to add `_bulk_get` to CouchDB just in order to immediately deprecate it in favour of `_bulk_revs`? If we all agreed on having `_bulk_revs`, then with CouchDB 2.0 release we'll provide it and only. And since there is quite some time to let this release happens, PouchDB / Couchbase / other folks may deprecate `_bulk_get` on their side only and provide CouchDB-compatible `_bulk_revs` replacement.

        Show
        kxepal Alexander Shorin added a comment - Jan Lehnardt Is there a reason to add `_bulk_get` to CouchDB just in order to immediately deprecate it in favour of `_bulk_revs`? If we all agreed on having `_bulk_revs`, then with CouchDB 2.0 release we'll provide it and only. And since there is quite some time to let this release happens, PouchDB / Couchbase / other folks may deprecate `_bulk_get` on their side only and provide CouchDB-compatible `_bulk_revs` replacement.
        Hide
        janl Jan Lehnardt added a comment -

        Alexander Shorin Like I said, we will make lives better for people who are on existing versions who support _bulk_get. And it doesn’t cost us anything, it’s just an alias.

        Show
        janl Jan Lehnardt added a comment - Alexander Shorin Like I said, we will make lives better for people who are on existing versions who support _bulk_get. And it doesn’t cost us anything, it’s just an alias.
        Hide
        dholth Daniel Holth added a comment -

        That would be great. Does the Couchbase _bulk_get support JSON? I
        think it is preferable to avoid multipart/mime parsing in JavaScript.

        Show
        dholth Daniel Holth added a comment - That would be great. Does the Couchbase _bulk_get support JSON? I think it is preferable to avoid multipart/mime parsing in JavaScript.
        Hide
        kxepal Alexander Shorin added a comment -

        Jan Lehnardt ok, have nothing against that.

        Show
        kxepal Alexander Shorin added a comment - Jan Lehnardt ok, have nothing against that.
        Show
        janl Jan Lehnardt added a comment - Daniel Holth According to https://github.com/apache/couchdb-couch/pull/18#issuecomment-67550634 yes
        Hide
        nolanlawson Nolan Lawson added a comment -

        Daniel Holth I'd happily accept a pull request on PouchDB that only does JSON. It's still faster than the alternative.

        Show
        nolanlawson Nolan Lawson added a comment - Daniel Holth I'd happily accept a pull request on PouchDB that only does JSON. It's still faster than the alternative.
        Hide
        githubbot ASF GitHub Bot added a comment -

        GitHub user kxepal opened a pull request:

        https://github.com/apache/couchdb-chttpd/pull/33

        Implement /db/_bulk_get endpoint

        Based on RCouch implementation by @benoitc.

        COUCHDB-2310

        You can merge this pull request into a Git repository by running:

        $ git pull https://github.com/kxepal/couchdb-chttpd 2310-bulk_get

        Alternatively you can review and apply these changes as the patch at:

        https://github.com/apache/couchdb-chttpd/pull/33.patch

        To close this pull request, make a commit to your master/trunk branch
        with (at least) the following in the commit message:

        This closes #33


        commit ced8a8d77fa100ea83f1a8f87bccb6ea17799af9
        Author: Alexander Shorin <kxepal@apache.org>
        Date: 2015-04-22T18:30:47Z

        Implement /db/_bulk_get endpoint

        Based on RCouch implementation by @benoitc.

        COUCHDB-2310


        Show
        githubbot ASF GitHub Bot added a comment - GitHub user kxepal opened a pull request: https://github.com/apache/couchdb-chttpd/pull/33 Implement /db/_bulk_get endpoint Based on RCouch implementation by @benoitc. COUCHDB-2310 You can merge this pull request into a Git repository by running: $ git pull https://github.com/kxepal/couchdb-chttpd 2310-bulk_get Alternatively you can review and apply these changes as the patch at: https://github.com/apache/couchdb-chttpd/pull/33.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #33 commit ced8a8d77fa100ea83f1a8f87bccb6ea17799af9 Author: Alexander Shorin <kxepal@apache.org> Date: 2015-04-22T18:30:47Z Implement /db/_bulk_get endpoint Based on RCouch implementation by @benoitc. COUCHDB-2310
        Hide
        jira-bot ASF subversion and git services added a comment -

        Commit 933ba2e3a2a77637b4ad275cf505b7d8a3ef0777 in couchdb-chttpd's branch refs/heads/master from Alexander Shorin
        [ https://git-wip-us.apache.org/repos/asf?p=couchdb-chttpd.git;h=933ba2e ]

        Implement /db/_bulk_get endpoint

        COUCHDB-2310

        Show
        jira-bot ASF subversion and git services added a comment - Commit 933ba2e3a2a77637b4ad275cf505b7d8a3ef0777 in couchdb-chttpd's branch refs/heads/master from Alexander Shorin [ https://git-wip-us.apache.org/repos/asf?p=couchdb-chttpd.git;h=933ba2e ] Implement /db/_bulk_get endpoint COUCHDB-2310
        Hide
        githubbot ASF GitHub Bot added a comment -

        Github user asfgit closed the pull request at:

        https://github.com/apache/couchdb-chttpd/pull/33

        Show
        githubbot ASF GitHub Bot added a comment - Github user asfgit closed the pull request at: https://github.com/apache/couchdb-chttpd/pull/33
        Hide
        kxepal Alexander Shorin added a comment -

        Fixed in first iteration which assumes JSON API only. Still need to implement multipart API in order to be compatible with Couchbase and teach couch_replicator /db/_bulk_get trick to improve replication performance.

        Show
        kxepal Alexander Shorin added a comment - Fixed in first iteration which assumes JSON API only. Still need to implement multipart API in order to be compatible with Couchbase and teach couch_replicator /db/_bulk_get trick to improve replication performance.
        Hide
        githubbot ASF GitHub Bot added a comment -

        Github user dholth closed the pull request at:

        https://github.com/apache/couchdb-couch/pull/18

        Show
        githubbot ASF GitHub Bot added a comment - Github user dholth closed the pull request at: https://github.com/apache/couchdb-couch/pull/18
        Hide
        jira-bot ASF subversion and git services added a comment -

        Commit ab3c51ebb1fde2a773df8afd1b3f46755cfda472 in couchdb's branch refs/heads/1.x.x from Alexander Shorin
        [ https://gitbox.apache.org/repos/asf?p=couchdb.git;h=ab3c51e ]

        Implement /db/_bulk_get endpoint

        COUCHDB-2310

        Show
        jira-bot ASF subversion and git services added a comment - Commit ab3c51ebb1fde2a773df8afd1b3f46755cfda472 in couchdb's branch refs/heads/1.x.x from Alexander Shorin [ https://gitbox.apache.org/repos/asf?p=couchdb.git;h=ab3c51e ] Implement /db/_bulk_get endpoint COUCHDB-2310

          People

          • Assignee:
            kxepal Alexander Shorin
            Reporter:
            nolanlawson Nolan Lawson
          • Votes:
            5 Vote for this issue
            Watchers:
            12 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development