CouchDB
  1. CouchDB
  2. COUCHDB-441

Finally implement pre-write-doc-edit handlers.

    Details

    • Type: Improvement Improvement
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: 0.10
    • Fix Version/s: 0.10
    • Component/s: HTTP Interface
    • Labels:
      None

      Description

      It would be useful for auditing to have the identity of the user who inserted a new revision and the timestamp of the operation to be inserted in the document in the same way that the new revision number is.

      Doing this at the application level is not adequate since it would be readily spoofable and would bypass the authentication handler.

      There is a comment in couch_db:update_docs about generating new revision ids, but I couldn't quite comprehend what specific code was responsible for inserting the id into the document.

      1. COUCHDB-441.patch
        10 kB
        Paul Joseph Davis
      2. COUCHDB-441.3.patch
        21 kB
        Jason Davies
      3. COUCHDB-441.2.patch
        17 kB
        Jason Davies

        Activity

        Hide
        Paul Joseph Davis added a comment -

        Inserting timestamps automagically would be bad because it would limit a whole swath of use cases. Probably the same for user id.

        The feature you're wanting is the end point that allows a JavaScript function to mutate incoming docs before they're written to the DB. If there isn't a ticket for that yet please create one so that it stares at us like a lost puppy and it'll be gotten to.

        Show
        Paul Joseph Davis added a comment - Inserting timestamps automagically would be bad because it would limit a whole swath of use cases. Probably the same for user id. The feature you're wanting is the end point that allows a JavaScript function to mutate incoming docs before they're written to the DB. If there isn't a ticket for that yet please create one so that it stares at us like a lost puppy and it'll be gotten to.
        Hide
        Curt Arnold added a comment -

        Reopened the bug with a new title that focuses more on the use-case than the approach.

        I guess I could see situations where you may not want to have that info inserted into the documents, so it would need to be configurable.

        Could be something similar to validate_doc_update (would likely need to occur before validate_doc_update) but would have the ability to modify the document, would need to have access to the user_ctx and maybe something equivalent for the server context.

        Show
        Curt Arnold added a comment - Reopened the bug with a new title that focuses more on the use-case than the approach. I guess I could see situations where you may not want to have that info inserted into the documents, so it would need to be configurable. Could be something similar to validate_doc_update (would likely need to occur before validate_doc_update) but would have the ability to modify the document, would need to have access to the user_ctx and maybe something equivalent for the server context.
        Hide
        Paul Joseph Davis added a comment -

        Renamed before I forget what the other title means.

        Show
        Paul Joseph Davis added a comment - Renamed before I forget what the other title means.
        Hide
        Robert Burke added a comment - - edited

        We could use either a generic hook to insert timestamps ourselves, or some built-in handling where possibly the string $now$ in the JSON (possibly with a format specifier to get ISO 8601 dates or other date formats) gets replaced with the current UTC time.

        We have many timestamps that we need to record other than just created / last modified (when the user completed a certain workflow step, etc). We can keep a few db machines in close enough synch with a time server. We cannot rely on our client machines to have the correct time.

        Show
        Robert Burke added a comment - - edited We could use either a generic hook to insert timestamps ourselves, or some built-in handling where possibly the string $now$ in the JSON (possibly with a format specifier to get ISO 8601 dates or other date formats) gets replaced with the current UTC time. We have many timestamps that we need to record other than just created / last modified (when the user completed a certain workflow step, etc). We can keep a few db machines in close enough synch with a time server. We cannot rely on our client machines to have the correct time.
        Hide
        Robert Burke added a comment -

        Being able to insert info in docs when committed to the db implies that using PUT/POST to create/update a document needs to return more in the response than what it currently does, for example:

        {"ok":true, "id":"123BAC", "rev":"946B7D1C"}

        It probably needs to return the entire JSON of the new or updated doc. Otherwise, gotta call the db twice.

        Show
        Robert Burke added a comment - Being able to insert info in docs when committed to the db implies that using PUT/POST to create/update a document needs to return more in the response than what it currently does, for example: {"ok":true, "id":"123BAC", "rev":"946B7D1C"} It probably needs to return the entire JSON of the new or updated doc. Otherwise, gotta call the db twice.
        Hide
        Curt Arnold added a comment -

        ?include_doc=true could be borrowed from the view syntax so that you got the committed doc back as the "doc" member of the reply regardless of whether there was any pre-write-doc-edit handler. If you aren't modifying or you don't need the doc, you could omit the param or set it to false.

        I'd discourage special replacements sequences, then you have to figure out how to escape them when you really wanted a literal "$now$" in the body.

        Show
        Curt Arnold added a comment - ?include_doc=true could be borrowed from the view syntax so that you got the committed doc back as the "doc" member of the reply regardless of whether there was any pre-write-doc-edit handler. If you aren't modifying or you don't need the doc, you could omit the param or set it to false. I'd discourage special replacements sequences, then you have to figure out how to escape them when you really wanted a literal "$now$" in the body.
        Hide
        Paul Joseph Davis added a comment -

        Yep. This is a patch for _update handlers.

        To use them, create a _design doc that looks like this:

        {
        "_id": "_design/foo",
        "updates": {
        "mult": "function(oldDoc, newDoc, req, userCtx)

        {oldDoc.value = oldDoc.value * 2;}

        "
        }
        }

        And then PUT a document to this URL:

        http://127.0.0.1:5984/db_name/_design/foo/_update/mult/$

        {DOCID}

        or POST a document to

        http://127.0.0.1:5984/db_name/_design/foo/_update/mult

        You can also pull the code from:

        git://github.com/davisp/couchdb.git

        If you're of the git persuasion.

        Also, I realized while implementing this that it more or less enables an often sought after ability with a small caveat. +1 beer to the person that spots that. +6 beers to the person that sees the caveat.

        Yay updates!

        Show
        Paul Joseph Davis added a comment - Yep. This is a patch for _update handlers. To use them, create a _design doc that looks like this: { "_id": "_design/foo", "updates": { "mult": "function(oldDoc, newDoc, req, userCtx) {oldDoc.value = oldDoc.value * 2;} " } } And then PUT a document to this URL: http://127.0.0.1:5984/db_name/_design/foo/_update/mult/$ {DOCID} or POST a document to http://127.0.0.1:5984/db_name/_design/foo/_update/mult You can also pull the code from: git://github.com/davisp/couchdb.git If you're of the git persuasion. Also, I realized while implementing this that it more or less enables an often sought after ability with a small caveat. +1 beer to the person that spots that. +6 beers to the person that sees the caveat. Yay updates!
        Hide
        Jason Davies added a comment -

        Nice work Paul! One thing I noticed about your patch is that _update expects a JSON body in the request. Can we remove this requirement and make it so the function signature is simply (doc, req, userCtx)? In my oauth branch I've modified couch_httpd_external.erl to always populate req.userCtx so the function signature will be even shorter when this gets merged.

        Then we can do fun things like handle XML bodies in the request.

        Show
        Jason Davies added a comment - Nice work Paul! One thing I noticed about your patch is that _update expects a JSON body in the request. Can we remove this requirement and make it so the function signature is simply (doc, req, userCtx)? In my oauth branch I've modified couch_httpd_external.erl to always populate req.userCtx so the function signature will be even shorter when this gets merged. Then we can do fun things like handle XML bodies in the request.
        Hide
        Jason Davies added a comment -

        The other thing I forgot to mention is that I would like the function to return an arbitrary response body too, so the function would be something like:

        function(doc, req) {

        // do some processing on doc

        return

        {doc: doc, body: body, headers: headers}

        ;

        }

        Show
        Jason Davies added a comment - The other thing I forgot to mention is that I would like the function to return an arbitrary response body too, so the function would be something like: function(doc, req) { // do some processing on doc return {doc: doc, body: body, headers: headers} ; }
        Hide
        Benoit Chesneau added a comment -

        Why another url handelr and not reusing _show ? Having all verbs on one place would be interresting. I don't have time to test it right now, but I will tomottow morning.

        Show
        Benoit Chesneau added a comment - Why another url handelr and not reusing _show ? Having all verbs on one place would be interresting. I don't have time to test it right now, but I will tomottow morning.
        Hide
        Christopher Lenz added a comment -

        I think a post-save update function should be exactly analogous to the validate_doc_func feature already in CouchDB: add a function to the design doc, and have CouchDB invoke it for any document update.

        In fact, this functionality could be part of the validation routine if the doc wasn't made read-only (and the function returned a new document), saving some overhead.

        Am I missing something?

        Show
        Christopher Lenz added a comment - I think a post-save update function should be exactly analogous to the validate_doc_func feature already in CouchDB: add a function to the design doc, and have CouchDB invoke it for any document update. In fact, this functionality could be part of the validation routine if the doc wasn't made read-only (and the function returned a new document), saving some overhead. Am I missing something?
        Hide
        Jan Lehnardt added a comment -

        @christopher validation runs on replication time. update modifications do not.

        Show
        Jan Lehnardt added a comment - @christopher validation runs on replication time. update modifications do not.
        Hide
        Paul Joseph Davis added a comment -

        My original thoughts were that this would be a pretty thin wrapper that would mutate a single document when putting. Adding the ability to work with arbitrary data and return arbitrary responses changes the game somewhat.

        I'd disagree pretty strongly with using _show because the basic intent is quite different.

        The other thing I just realized is that there's also no way to use the current scheme with _bulk_docs.

        Things to think about.

        Show
        Paul Joseph Davis added a comment - My original thoughts were that this would be a pretty thin wrapper that would mutate a single document when putting. Adding the ability to work with arbitrary data and return arbitrary responses changes the game somewhat. I'd disagree pretty strongly with using _show because the basic intent is quite different. The other thing I just realized is that there's also no way to use the current scheme with _bulk_docs. Things to think about.
        Hide
        Paul Joseph Davis added a comment -

        @christopher

        The other thing about validate_doc_update is that ordering isn't important. Any function can veto the update. But for mutation operations if we allow any number of functions then your function has to accept any possible configuration of the order of applying those functions. And it has to work with possibly unknown other code in the same db. Basically the possibilities started hurting my brain so I just went with the url approach.

        Show
        Paul Joseph Davis added a comment - @christopher The other thing about validate_doc_update is that ordering isn't important. Any function can veto the update. But for mutation operations if we allow any number of functions then your function has to accept any possible configuration of the order of applying those functions. And it has to work with possibly unknown other code in the same db. Basically the possibilities started hurting my brain so I just went with the url approach.
        Hide
        Curt Arnold added a comment -

        I wasn't expecting a new API, but a designable or configurable feature for the existing _bulk_docs and PUT request handlers. If you were using this to ensure that all documents had the appropriate username, timestamp, requesting ip address of last modification, you'd need to disable _bulk_docs and the PUT docid and rewrite all apps to use this new API.

        A doc-edit-handler in the existing API would probably still be called on replication, but could distinguish between a replication action and a normal action.

        As for the multiple doc-edit-handler, I think you could accept multiple handlers, but explicitly declare that there is no promised ordering. If a designer adds doc-edit-handlers that conflict or have an order-dependency, then they have nobody to blame except themselves.

        Show
        Curt Arnold added a comment - I wasn't expecting a new API, but a designable or configurable feature for the existing _bulk_docs and PUT request handlers. If you were using this to ensure that all documents had the appropriate username, timestamp, requesting ip address of last modification, you'd need to disable _bulk_docs and the PUT docid and rewrite all apps to use this new API. A doc-edit-handler in the existing API would probably still be called on replication, but could distinguish between a replication action and a normal action. As for the multiple doc-edit-handler, I think you could accept multiple handlers, but explicitly declare that there is no promised ordering. If a designer adds doc-edit-handlers that conflict or have an order-dependency, then they have nobody to blame except themselves.
        Hide
        Paul Joseph Davis added a comment -

        Curt,

        Three things that come to mind:

        1. What would the API look like if it weren't an API endpoint? I just ran with what was easy to implement, but between your's, Jason's and Chritopher's comments I'm wondering if maybe there's another approach that is more general.

        2. I'm fairly against calling mutation handlers on replication as it seems like it could very easily lead to an inability to reach steady state. If we force clients to choose to push docs to a handler that is OOB from replication then we alleviate such concerns.

        3. I'm also fairly against the arbitrary ordering. A big understanding in the community is that we're presenting users the ability to replicate _design docs as an entire application. Forcing applications to write against transient undefinable situations seems like a recipe for disaster. The question isn't if I as a design declare an edit function, its that I have to account for any possible configuration of order of any possible input. I could see it being very easy for two independent designers declaring mutually exclusive conditions and rending a DB un-editable from app code.

        Paul

        Show
        Paul Joseph Davis added a comment - Curt, Three things that come to mind: 1. What would the API look like if it weren't an API endpoint? I just ran with what was easy to implement, but between your's, Jason's and Chritopher's comments I'm wondering if maybe there's another approach that is more general. 2. I'm fairly against calling mutation handlers on replication as it seems like it could very easily lead to an inability to reach steady state. If we force clients to choose to push docs to a handler that is OOB from replication then we alleviate such concerns. 3. I'm also fairly against the arbitrary ordering. A big understanding in the community is that we're presenting users the ability to replicate _design docs as an entire application. Forcing applications to write against transient undefinable situations seems like a recipe for disaster. The question isn't if I as a design declare an edit function, its that I have to account for any possible configuration of order of any possible input. I could see it being very easy for two independent designers declaring mutually exclusive conditions and rending a DB un-editable from app code. Paul
        Hide
        Curt Arnold added a comment -

        1. Other than possibly adding a ?include_doc(s) query parameter that would indicate whether the modified doc needs to be returned, I would not see the HTTP API changing. On the back-end, I'd be open to anything from a hard-coded entry, configured Erlang handler or design doc similar to validate_doc.

        2. I'm thinking there is already a different code path for replication since I'd think that replication would preserve the existing revision id. However, on my initial scan of couch_db:update_docs, I could not identify the code that was responsible for generating the new revision id. If done with a design doc approach, either the function would not be called on during replication, the design doc would define whether it was called during replication, or the function would be called but it could detect whether it was being called during replication. I haven't thought of a reason that you'd want to invoke during replication, so I'd be fine with any of them.

        3. Instead of having the edit function directly modifying the document, it could return a document that is merged into the existing document. Any top-level key that is present in the return value from an doc-edit function would be inserted in the document replacing any existing value for that key in the document. However, if multiple edit functions add the same key value, the PUT would fail with a conflict error.

        Show
        Curt Arnold added a comment - 1. Other than possibly adding a ?include_doc(s) query parameter that would indicate whether the modified doc needs to be returned, I would not see the HTTP API changing. On the back-end, I'd be open to anything from a hard-coded entry, configured Erlang handler or design doc similar to validate_doc. 2. I'm thinking there is already a different code path for replication since I'd think that replication would preserve the existing revision id. However, on my initial scan of couch_db:update_docs, I could not identify the code that was responsible for generating the new revision id. If done with a design doc approach, either the function would not be called on during replication, the design doc would define whether it was called during replication, or the function would be called but it could detect whether it was being called during replication. I haven't thought of a reason that you'd want to invoke during replication, so I'd be fine with any of them. 3. Instead of having the edit function directly modifying the document, it could return a document that is merged into the existing document. Any top-level key that is present in the return value from an doc-edit function would be inserted in the document replacing any existing value for that key in the document. However, if multiple edit functions add the same key value, the PUT would fail with a conflict error.
        Hide
        Jason Davies added a comment -

        Updated patch that uses req.userCtx now that OAuth/cookie auth has landed.

        Show
        Jason Davies added a comment - Updated patch that uses req.userCtx now that OAuth/cookie auth has landed.
        Hide
        Jason Davies added a comment -

        Forgot to include new files share/server/update.js and share/www/script/test/updates.js. Included in this patch.

        Show
        Jason Davies added a comment - Forgot to include new files share/server/update.js and share/www/script/test/updates.js. Included in this patch.
        Hide
        Jan Lehnardt added a comment -

        Fixed in trunk and 0.10 earlier this year.

        Show
        Jan Lehnardt added a comment - Fixed in trunk and 0.10 earlier this year.

          People

          • Assignee:
            Unassigned
            Reporter:
            Curt Arnold
          • Votes:
            2 Vote for this issue
            Watchers:
            0 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development