CouchDB
  1. CouchDB
  2. COUCHDB-449

Turn off delayed commits by default

    Details

    • Type: Bug Bug
    • Status: Resolved
    • Priority: Blocker Blocker
    • Resolution: Fixed
    • Affects Version/s: 0.9, 0.9.1
    • Fix Version/s: 0.11
    • Component/s: Database Core
    • Labels:
      None

      Description

      Delayed commits make CouchDB significantly faster. They also open a one second window for data loss. In 0.9 and trunk, delayed commits are enabled by default and can be overridden with HTTP headers and an explicit API call to flush the write buffer. I suggest to turn off delayed commits by default and use the same overrides to enable it per request. A per-database option is possible, too.

      One concern is developer workflow speed. The setting affects the test suite performance significantly. I'd opt to change couch.js to set the appropriate header to enable delayed commits for tests.

      CouchDB should guarantee data safety first and speed second, with sensible overrides.

      1. slow.rb
        0.2 kB
        Chris Anderson
      2. delayed_commits_v1.patch
        8 kB
        Adam Kocoloski

        Activity

        Hide
        Jason Davies added a comment -

        +1. Can we make this a config setting? delayed_commits = false by default but can be turned on for a node for speed junkies.

        Show
        Jason Davies added a comment - +1. Can we make this a config setting? delayed_commits = false by default but can be turned on for a node for speed junkies.
        Hide
        Jan Lehnardt added a comment -

        good idea Jason. or a new section:

        [delayed_commits]
        dbname = true
        dbname2 = false
        ...
        ...

        so you can have a "safe" db for your app and a "fast" db for, say, logging.

        Show
        Jan Lehnardt added a comment - good idea Jason. or a new section: [delayed_commits] dbname = true dbname2 = false ... ... so you can have a "safe" db for your app and a "fast" db for, say, logging.
        Hide
        Brian Candler added a comment -

        Or perhaps you could set a different periodic flush interval for each
        database, with 0 equivalent to no delayed commit.

        For me, the question is specifically, what guarantees does CouchDB give to
        clients about your data safety, and when - for example, at the point where
        you get a HTTP response?

        There are at least three different scenarios that I'm aware of at the
        moment.
        1. client supplies 'batch=ok' URL parameter
        2. client supplies no special parameters
        3. client supplies 'X-Couch-Full-Commit: true' header

        From the client's perspective, I can see no difference between (1) and (2).
        After receiving a HTTP response, the data is likely to make it to disk at
        some time in the future, but it could be lost if the plug is pulled in the
        next few seconds.

        In case (3), the document is guaranteed to be on disk after the HTTP
        response is returned [as long as drive internal write cache is disabled].
        This is equivalent to "QOS level 1" in the MQTT protocol:
        http://publib.boulder.ibm.com/infocenter/wmbhelp/v6r0m0/index.jsp?topic=/com.ibm.etools.mft.doc/ac10850_.htm

        However, it also forces writes of everything received up to this point, so
        it's very inefficient if you are doing lots of writes with this header on.

        Sometimes, you don't require data to be written to disk immediately, but you
        do want to be notified when it has been written to disk in order to take
        some subsequent action (such as acknowledging the successful save to a
        downstream consumer).

        I would like to propose an alternative approach similar to TCP sequence
        numbers. We already have a sequence number which counts documents added to
        the database (update_seq). I suggest we keep a separate watermark which is
        the sequence number when the database was last flushed to disk (say
        flush_seq).

        Now:

        • when you PUT a document, send the update_seq as part of the response
          (let's call it doc_seq)
        • update_seq may continue to increment as more documents are updated
        • at some point in the future, when data is flushed to disk, set
          flush_seq := update_seq
        • if the client is interested to know when its document has been flushed
          to disk, it can poll mydb to check for flush_seq >= doc_seq
        • it could be an option in the HTTP request to delay the response until
          flush_seq >= doc_seq

        That means you would get the benefit of knowing that the document had been
        committed to disk, without the cost of having to commit it. Rather, you wait
        until someone else wants to force a full commit, or the periodic full commit
        takes place.

        Then the only per-database tunable you need is the periodic commit interval.
        Set it to 5 seconds for logging databases; 0.2 for RADIUS accounting (where
        you want to generate a response within 200ms); and 0 if you want every
        single document to be committed as soon as it arrives.

        Thoughts?

        Something like this is doable at present, but requires a buffering proxy.
        For example, you can receive RADIUS accounting updates into a buffer, then
        every 200ms do a POST to _bulk_docs with X-Couch-Full-Commit: true and
        return success to all the clients.

        Since CouchDB has to buffer these documents in the VFS cache anyway, it
        would be convenient (and more efficient) to let it handle the periodic
        flushing too.

        Regards,

        Brian.

        Show
        Brian Candler added a comment - Or perhaps you could set a different periodic flush interval for each database, with 0 equivalent to no delayed commit. For me, the question is specifically, what guarantees does CouchDB give to clients about your data safety, and when - for example, at the point where you get a HTTP response? There are at least three different scenarios that I'm aware of at the moment. 1. client supplies 'batch=ok' URL parameter 2. client supplies no special parameters 3. client supplies 'X-Couch-Full-Commit: true' header From the client's perspective, I can see no difference between (1) and (2). After receiving a HTTP response, the data is likely to make it to disk at some time in the future, but it could be lost if the plug is pulled in the next few seconds. In case (3), the document is guaranteed to be on disk after the HTTP response is returned [as long as drive internal write cache is disabled] . This is equivalent to "QOS level 1" in the MQTT protocol: http://publib.boulder.ibm.com/infocenter/wmbhelp/v6r0m0/index.jsp?topic=/com.ibm.etools.mft.doc/ac10850_.htm However, it also forces writes of everything received up to this point, so it's very inefficient if you are doing lots of writes with this header on. Sometimes, you don't require data to be written to disk immediately, but you do want to be notified when it has been written to disk in order to take some subsequent action (such as acknowledging the successful save to a downstream consumer). I would like to propose an alternative approach similar to TCP sequence numbers. We already have a sequence number which counts documents added to the database (update_seq). I suggest we keep a separate watermark which is the sequence number when the database was last flushed to disk (say flush_seq). Now: when you PUT a document, send the update_seq as part of the response (let's call it doc_seq) update_seq may continue to increment as more documents are updated at some point in the future, when data is flushed to disk, set flush_seq := update_seq if the client is interested to know when its document has been flushed to disk, it can poll mydb to check for flush_seq >= doc_seq it could be an option in the HTTP request to delay the response until flush_seq >= doc_seq That means you would get the benefit of knowing that the document had been committed to disk, without the cost of having to commit it. Rather, you wait until someone else wants to force a full commit, or the periodic full commit takes place. Then the only per-database tunable you need is the periodic commit interval. Set it to 5 seconds for logging databases; 0.2 for RADIUS accounting (where you want to generate a response within 200ms); and 0 if you want every single document to be committed as soon as it arrives. Thoughts? Something like this is doable at present, but requires a buffering proxy. For example, you can receive RADIUS accounting updates into a buffer, then every 200ms do a POST to _bulk_docs with X-Couch-Full-Commit: true and return success to all the clients. Since CouchDB has to buffer these documents in the VFS cache anyway, it would be convenient (and more efficient) to let it handle the periodic flushing too. Regards, Brian.
        Hide
        Jan Lehnardt added a comment -

        Brian, thanks for your thoughts

        The "just write and let me know when things have been committed" can be done with the _changes feed already. No need for a separate sequence id.

        Show
        Jan Lehnardt added a comment - Brian, thanks for your thoughts The "just write and let me know when things have been committed" can be done with the _changes feed already. No need for a separate sequence id.
        Hide
        Brian Candler added a comment -

        Just to be clear: _changes is supposed only to update after a commit has
        taken place, not after a write?

        If so, I cannot demonstrate it. If I write a document and then immediately
        read _changes, it always appears. See below at .

        Furthermore, the same is true if I run

        $ curl http://127.0.0.1:5984/test/_changes?feed=continuous

        in another window. As soon as I add a document in the first window, it
        appears in the _changes feed.

        My very rough scan of the source suggests that a delayed commit should take
        place after 1 second:

        Delay and (Db#db.waiting_delayed_commit == nil) ->
        Db#db

        {waiting_delayed_commit= erlang:send_after(1000, self(), delayed_commit)}

        ;

        So if that's right, and what you say is true, then I would expect not to see
        the document in _changes for this long.

        OTOH, with batch=ok the commit is delayed indefinitely. I have raised this
        as a separate ticket COUCHDB-454)

        All tested with HEAD (git commit aebdb31001126dab6b579b8cc2e605ef7ec499c6)
        and 12b5 under Jaunty.

        Regards,

        Brian.


        $ curl -X DELETE http://127.0.0.1:5984/test

        {"ok":true}

        $ curl -X PUT http://127.0.0.1:5984/test

        {"ok":true}

        $ curl http://127.0.0.1:5984/test/_changes

        {"results":[ ], "last_seq":0}

        $ curl -X POST -d'{}' http://127.0.0.1:5984/test; curl http://127.0.0.1:5984/test/_changes

        {"ok":true,"id":"70708dcbc24444977b759365f9731f27","rev":"1-967a00dff5e02add41819138abb3284d"}

        {"results":[
        {"seq":1,"id":"70708dcbc24444977b759365f9731f27","changes":[

        {"rev":"1-967a00dff5e02add41819138abb3284d"}

        ]}
        ],
        "last_seq":1}

        $ curl -X POST -d'{}' http://127.0.0.1:5984/test; curl http://127.0.0.1:5984/test/_changes

        {"ok":true,"id":"1d4596c1cb715c0da9f99980fea0a3a2","rev":"1-967a00dff5e02add41819138abb3284d"}

        {"results":[
        {"seq":1,"id":"70708dcbc24444977b759365f9731f27","changes":[

        {"rev":"1-967a00dff5e02add41819138abb3284d"}

        ]},
        {"seq":2,"id":"1d4596c1cb715c0da9f99980fea0a3a2","changes":[

        {"rev":"1-967a00dff5e02add41819138abb3284d"}

        ]}
        ],
        "last_seq":2}

        $ curl -X POST -d'{}' http://127.0.0.1:5984/test; curl http://127.0.0.1:5984/test/_changes

        {"ok":true,"id":"a2feeaaca391446bb7a0f24c359ff79e","rev":"1-967a00dff5e02add41819138abb3284d"}

        {"results":[
        {"seq":1,"id":"70708dcbc24444977b759365f9731f27","changes":[

        {"rev":"1-967a00dff5e02add41819138abb3284d"}

        ]},
        {"seq":2,"id":"1d4596c1cb715c0da9f99980fea0a3a2","changes":[

        {"rev":"1-967a00dff5e02add41819138abb3284d"}

        ]},
        {"seq":3,"id":"a2feeaaca391446bb7a0f24c359ff79e","changes":[

        {"rev":"1-967a00dff5e02add41819138abb3284d"}

        ]}
        ],
        "last_seq":3}

        $ curl -X POST -d'{}' http://127.0.0.1:5984/test; curl -X POST -d'{}' http://127.0.0.1:5984/test; curl -X POST -d'{}' http://127.0.0.1:5984/test; curl http://127.0.0.1:5984/test/_changes

        {"ok":true,"id":"a2262a5904690aec5c64bb61f44903ed","rev":"1-967a00dff5e02add41819138abb3284d"} {"ok":true,"id":"26fdac7e139531e0f4352a089d4db7f4","rev":"1-967a00dff5e02add41819138abb3284d"} {"ok":true,"id":"f6bb36540484788becd54391dbc6189b","rev":"1-967a00dff5e02add41819138abb3284d"}

        {"results":[
        {"seq":1,"id":"70708dcbc24444977b759365f9731f27","changes":[

        {"rev":"1-967a00dff5e02add41819138abb3284d"}

        ]},
        {"seq":2,"id":"1d4596c1cb715c0da9f99980fea0a3a2","changes":[

        {"rev":"1-967a00dff5e02add41819138abb3284d"}

        ]},
        {"seq":3,"id":"a2feeaaca391446bb7a0f24c359ff79e","changes":[

        {"rev":"1-967a00dff5e02add41819138abb3284d"}

        ]},
        {"seq":4,"id":"a2262a5904690aec5c64bb61f44903ed","changes":[

        {"rev":"1-967a00dff5e02add41819138abb3284d"}

        ]},
        {"seq":5,"id":"26fdac7e139531e0f4352a089d4db7f4","changes":[

        {"rev":"1-967a00dff5e02add41819138abb3284d"}

        ]},
        {"seq":6,"id":"f6bb36540484788becd54391dbc6189b","changes":[

        {"rev":"1-967a00dff5e02add41819138abb3284d"}

        ]}
        ],
        "last_seq":6}

        Show
        Brian Candler added a comment - Just to be clear: _changes is supposed only to update after a commit has taken place, not after a write? If so, I cannot demonstrate it. If I write a document and then immediately read _changes, it always appears. See below at . Furthermore, the same is true if I run $ curl http://127.0.0.1:5984/test/_changes?feed=continuous in another window. As soon as I add a document in the first window, it appears in the _changes feed. My very rough scan of the source suggests that a delayed commit should take place after 1 second: Delay and (Db#db.waiting_delayed_commit == nil) -> Db#db {waiting_delayed_commit= erlang:send_after(1000, self(), delayed_commit)} ; So if that's right, and what you say is true, then I would expect not to see the document in _changes for this long. OTOH, with batch=ok the commit is delayed indefinitely. I have raised this as a separate ticket COUCHDB-454 ) All tested with HEAD (git commit aebdb31001126dab6b579b8cc2e605ef7ec499c6) and 12b5 under Jaunty. Regards, Brian. $ curl -X DELETE http://127.0.0.1:5984/test {"ok":true} $ curl -X PUT http://127.0.0.1:5984/test {"ok":true} $ curl http://127.0.0.1:5984/test/_changes {"results":[ ], "last_seq":0} $ curl -X POST -d'{}' http://127.0.0.1:5984/test ; curl http://127.0.0.1:5984/test/_changes {"ok":true,"id":"70708dcbc24444977b759365f9731f27","rev":"1-967a00dff5e02add41819138abb3284d"} {"results":[ {"seq":1,"id":"70708dcbc24444977b759365f9731f27","changes":[ {"rev":"1-967a00dff5e02add41819138abb3284d"} ]} ], "last_seq":1} $ curl -X POST -d'{}' http://127.0.0.1:5984/test ; curl http://127.0.0.1:5984/test/_changes {"ok":true,"id":"1d4596c1cb715c0da9f99980fea0a3a2","rev":"1-967a00dff5e02add41819138abb3284d"} {"results":[ {"seq":1,"id":"70708dcbc24444977b759365f9731f27","changes":[ {"rev":"1-967a00dff5e02add41819138abb3284d"} ]}, {"seq":2,"id":"1d4596c1cb715c0da9f99980fea0a3a2","changes":[ {"rev":"1-967a00dff5e02add41819138abb3284d"} ]} ], "last_seq":2} $ curl -X POST -d'{}' http://127.0.0.1:5984/test ; curl http://127.0.0.1:5984/test/_changes {"ok":true,"id":"a2feeaaca391446bb7a0f24c359ff79e","rev":"1-967a00dff5e02add41819138abb3284d"} {"results":[ {"seq":1,"id":"70708dcbc24444977b759365f9731f27","changes":[ {"rev":"1-967a00dff5e02add41819138abb3284d"} ]}, {"seq":2,"id":"1d4596c1cb715c0da9f99980fea0a3a2","changes":[ {"rev":"1-967a00dff5e02add41819138abb3284d"} ]}, {"seq":3,"id":"a2feeaaca391446bb7a0f24c359ff79e","changes":[ {"rev":"1-967a00dff5e02add41819138abb3284d"} ]} ], "last_seq":3} $ curl -X POST -d'{}' http://127.0.0.1:5984/test ; curl -X POST -d'{}' http://127.0.0.1:5984/test ; curl -X POST -d'{}' http://127.0.0.1:5984/test ; curl http://127.0.0.1:5984/test/_changes {"ok":true,"id":"a2262a5904690aec5c64bb61f44903ed","rev":"1-967a00dff5e02add41819138abb3284d"} {"ok":true,"id":"26fdac7e139531e0f4352a089d4db7f4","rev":"1-967a00dff5e02add41819138abb3284d"} {"ok":true,"id":"f6bb36540484788becd54391dbc6189b","rev":"1-967a00dff5e02add41819138abb3284d"} {"results":[ {"seq":1,"id":"70708dcbc24444977b759365f9731f27","changes":[ {"rev":"1-967a00dff5e02add41819138abb3284d"} ]}, {"seq":2,"id":"1d4596c1cb715c0da9f99980fea0a3a2","changes":[ {"rev":"1-967a00dff5e02add41819138abb3284d"} ]}, {"seq":3,"id":"a2feeaaca391446bb7a0f24c359ff79e","changes":[ {"rev":"1-967a00dff5e02add41819138abb3284d"} ]}, {"seq":4,"id":"a2262a5904690aec5c64bb61f44903ed","changes":[ {"rev":"1-967a00dff5e02add41819138abb3284d"} ]}, {"seq":5,"id":"26fdac7e139531e0f4352a089d4db7f4","changes":[ {"rev":"1-967a00dff5e02add41819138abb3284d"} ]}, {"seq":6,"id":"f6bb36540484788becd54391dbc6189b","changes":[ {"rev":"1-967a00dff5e02add41819138abb3284d"} ]} ], "last_seq":6}
        Hide
        Adam Kocoloski added a comment -

        +1 on turning off delayed commits by default
        +1 for enabling them on a per-DB basis
        +0 for making the threshold configurable

        We should add a DB-level configuration facility at some point. It'd be nice to be able to edit this setting (and others, like continuous replication) without server-level admin privileges.

        Show
        Adam Kocoloski added a comment - +1 on turning off delayed commits by default +1 for enabling them on a per-DB basis +0 for making the threshold configurable We should add a DB-level configuration facility at some point. It'd be nice to be able to edit this setting (and others, like continuous replication) without server-level admin privileges.
        Hide
        Adam Kocoloski added a comment -

        Here's a patch to make delayed_commits a server-wide config option. The setting looks like

        [couchdb]
        delayed_commits = true

        and defaults to false. If finer-grained control is required users can override the default by setting the X-Couch-Full-Commit header to true or false.

        Jan mentioned enabling delayed_commits for the test suite. I didn't do this.

        Show
        Adam Kocoloski added a comment - Here's a patch to make delayed_commits a server-wide config option. The setting looks like [couchdb] delayed_commits = true and defaults to false. If finer-grained control is required users can override the default by setting the X-Couch-Full-Commit header to true or false. Jan mentioned enabling delayed_commits for the test suite. I didn't do this.
        Hide
        Adam Kocoloski added a comment -

        we can always add new issues if folks require finer-grained config options like per-DB defaults or custom delays.

        Show
        Adam Kocoloski added a comment - we can always add new issues if folks require finer-grained config options like per-DB defaults or custom delays.
        Hide
        Jan Lehnardt added a comment -

        For completeness: I turned on delayed commits for the test suite in r804727.

        Show
        Jan Lehnardt added a comment - For completeness: I turned on delayed commits for the test suite in r804727.
        Hide
        Chris Anderson added a comment -

        As we're currently discussing this on IRC, I should reopen the ticket.

        In a simple (naive serial writes) benchmark, I was able to get ~230 docs/sec with delayed_commits and about 5 docs/sec with full commit.

        I'm a little concerned that 50x slower is not worth it for correctness, as it puts us in the realm of unusable.

        Show
        Chris Anderson added a comment - As we're currently discussing this on IRC, I should reopen the ticket. In a simple (naive serial writes) benchmark, I was able to get ~230 docs/sec with delayed_commits and about 5 docs/sec with full commit. I'm a little concerned that 50x slower is not worth it for correctness, as it puts us in the realm of unusable.
        Hide
        Chris Anderson added a comment -

        here's my benchmark script

        Show
        Chris Anderson added a comment - here's my benchmark script
        Hide
        Adam Kocoloski added a comment -

        0.10.0 is out the door, adjusting FixFor on all remaining unresolved issues to 0.11 by default

        Show
        Adam Kocoloski added a comment - 0.10.0 is out the door, adjusting FixFor on all remaining unresolved issues to 0.11 by default
        Hide
        Jan Lehnardt added a comment -

        I call lazy consensus that we keep delayed_commits on by default. Please reopen if you disagree.

        Show
        Jan Lehnardt added a comment - I call lazy consensus that we keep delayed_commits on by default. Please reopen if you disagree.

          People

          • Assignee:
            Adam Kocoloski
            Reporter:
            Jan Lehnardt
          • Votes:
            0 Vote for this issue
            Watchers:
            0 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development