CouchDB
  1. CouchDB
  2. COUCHDB-1084

Remove unnecessary btree lookup inside couch_db_updater

    Details

    • Type: Improvement Improvement
    • Status: Open
    • Priority: Major Major
    • Resolution: Unresolved
    • Affects Version/s: 1.2
    • Fix Version/s: None
    • Component/s: Database Core
    • Labels:
      None
    • Skill Level:
      Committers Level (Medium to Hard)

      Description

      The CouchDB update process has an unnecessary btree lookup, where it reads the values in bulks, checks for conflicts, writes the docs to disk, updates the values appropriately and writes them to the btree out in a second step. It's possible to avoid this second step, and instead do all the checking, doc writing and value transformation in a single btree lookup, thereby reducing the number of btree traversals and disk IO.

      1. remove_btree_lookup.patch
        22 kB
        Damien Katz
      2. davisp-1084.patch
        16 kB
        Paul Joseph Davis

        Activity

        Hide
        Damien Katz added a comment -

        Applies to couchdb trunk revision 1078680

        Show
        Damien Katz added a comment - Applies to couchdb trunk revision 1078680
        Hide
        Damien Katz added a comment -

        The attached patch might have stability issues, but should give an idea of the performance impact of the change. Would like to see some bench markmarks to see if it actually helps.

        Show
        Damien Katz added a comment - The attached patch might have stability issues, but should give an idea of the performance impact of the change. Would like to see some bench markmarks to see if it actually helps.
        Hide
        Adam Kocoloski added a comment -

        Hey, nothing says Damien like a 400 line Subversion patch file. Welcome back!

        I think the whitespace change to couch_view_updater can be safely removed from the patch. I also reviewed the db_updater half of the patch with an eye towards making it smaller. Here's what I've got so far:

        https://github.com/kocolosk/couchdb/compare/1084-remove-unnecessary-lookup

        I'd like to better understand how the VM manages anonymous functions and whether there will be any issues creating one for each document update like that.

        Good stuff, though. I definitely get the gist of the patch and I like it.

        Show
        Adam Kocoloski added a comment - Hey, nothing says Damien like a 400 line Subversion patch file. Welcome back! I think the whitespace change to couch_view_updater can be safely removed from the patch. I also reviewed the db_updater half of the patch with an eye towards making it smaller. Here's what I've got so far: https://github.com/kocolosk/couchdb/compare/1084-remove-unnecessary-lookup I'd like to better understand how the VM manages anonymous functions and whether there will be any issues creating one for each document update like that. Good stuff, though. I definitely get the gist of the patch and I like it.
        Hide
        Paul Joseph Davis added a comment -

        First, a tsk tsk. There's a noticeable amount of changes that aren't actually related to the patch that make it difficult to review. I don't want to hamper my image as a crotchety oldtimer so I can't not point that out.

        The general idea seems to be pretty good. Passing a function down to the btree modification seems like it could have a notable impact on updates to avoid the double traversal in cases where its a row oriented update. Though the specific implementation of passing a bunch of

        {Key, ModifyFun}

        pairs makes me a bit squeamish. I'd like to see an investigation of passing a ModifyFun that has a signature of something like fun(Key, OldVal, NewVal, Acc) and returns the value to be inserted. Also, if I read the patch correctly, it effectively prevents the atom nil from being used as a key which seems odd.

        Either way, I'm definitely +1 on exploring the idea here. It'd be good to start doing some initial performance testing to see what we might expect for improvements.

        Show
        Paul Joseph Davis added a comment - First, a tsk tsk. There's a noticeable amount of changes that aren't actually related to the patch that make it difficult to review. I don't want to hamper my image as a crotchety oldtimer so I can't not point that out. The general idea seems to be pretty good. Passing a function down to the btree modification seems like it could have a notable impact on updates to avoid the double traversal in cases where its a row oriented update. Though the specific implementation of passing a bunch of {Key, ModifyFun} pairs makes me a bit squeamish. I'd like to see an investigation of passing a ModifyFun that has a signature of something like fun(Key, OldVal, NewVal, Acc) and returns the value to be inserted. Also, if I read the patch correctly, it effectively prevents the atom nil from being used as a key which seems odd. Either way, I'm definitely +1 on exploring the idea here. It'd be good to start doing some initial performance testing to see what we might expect for improvements.
        Hide
        Damien Katz added a comment -

        Thanks for the feedback on the code style, we definitely want to clean it up before commiting. Right now I'm more interested on the performance impact and how fruitful removing the btree lookup is. I'm hoping this patch will improve performance for all writes, both inserts and updates, but I don't have time set up benchmarks right now.

        Show
        Damien Katz added a comment - Thanks for the feedback on the code style, we definitely want to clean it up before commiting. Right now I'm more interested on the performance impact and how fruitful removing the btree lookup is. I'm hoping this patch will improve performance for all writes, both inserts and updates, but I don't have time set up benchmarks right now.
        Hide
        Jan Lehnardt added a comment -

        Relaximation r/w test, on My MacBook & SSD, test and DBs on the same machine:

        http://graphs.mikeal.couchone.com/#/graph/698bf36b6c64dbd19aa2bef63400268f

        About 10-20% more req/s and significant lower latency.

        Show
        Jan Lehnardt added a comment - Relaximation r/w test, on My MacBook & SSD, test and DBs on the same machine: http://graphs.mikeal.couchone.com/#/graph/698bf36b6c64dbd19aa2bef63400268f About 10-20% more req/s and significant lower latency.
        Hide
        Filipe Manana added a comment -

        E P I C !!

        Show
        Filipe Manana added a comment - E P I C !!
        Hide
        Jan Lehnardt added a comment -

        Relaximation r/w test on a Mac Mini with a spinning disk, relaximation and CouchDB on different machines on switched GigE ethernet.

        http://graphs.mikeal.couchone.com/#/graph/698bf36b6c64dbd19aa2bef63400478c

        A little less improvement, but definitely not worse. Damien says with database files larger than available ram, the impact should be bigger. If anyone can set up a test run like that, that'd be cool

        Show
        Jan Lehnardt added a comment - Relaximation r/w test on a Mac Mini with a spinning disk, relaximation and CouchDB on different machines on switched GigE ethernet. http://graphs.mikeal.couchone.com/#/graph/698bf36b6c64dbd19aa2bef63400478c A little less improvement, but definitely not worse. Damien says with database files larger than available ram, the impact should be bigger. If anyone can set up a test run like that, that'd be cool
        Hide
        Adam Kocoloski added a comment -

        Doesn't that graph show a 10-15% drop in mean reads per second with the patch? No idea whether that's statistically significant.

        Show
        Adam Kocoloski added a comment - Doesn't that graph show a 10-15% drop in mean reads per second with the patch? No idea whether that's statistically significant.
        Hide
        Randall Leeds added a comment -

        The difference between the SSD and spinning media here is noteworthy:
        SSD:
        ~100k more writes but only ~20k fewer reads total.

        Spinning:
        ~6k more writes but ~78k fewer reads.

        The best way I can think to explain this is that possibly the SSD test was CPU bound while the spinning media test was I/O bound. The latter is almost certainly true. In which case, I think we're seeing the seek penalty already, even with a small data set. Pushing more writes down to the OS more quickly means our disk heads are hovering at the end of the file more, perhaps. Thus, the negative impact on readers. However, it's certainly not that reads got slower since the patch hasn't changed the read path. It's merely an artifact of the testing environment and relaximation's as-fast-as-possible benchmark style.

        On the other hand, avoiding the seek by using an SSD and increasing the I/O bandwidth we maybe see just how much those saved CPU cycles can push writes through faster to more heavily saturate the disk and get niiiice write throughput.

        More testing maybe, but does this align with what you see?

        Show
        Randall Leeds added a comment - The difference between the SSD and spinning media here is noteworthy: SSD: ~100k more writes but only ~20k fewer reads total. Spinning: ~6k more writes but ~78k fewer reads. The best way I can think to explain this is that possibly the SSD test was CPU bound while the spinning media test was I/O bound. The latter is almost certainly true. In which case, I think we're seeing the seek penalty already, even with a small data set. Pushing more writes down to the OS more quickly means our disk heads are hovering at the end of the file more, perhaps. Thus, the negative impact on readers. However, it's certainly not that reads got slower since the patch hasn't changed the read path. It's merely an artifact of the testing environment and relaximation's as-fast-as-possible benchmark style. On the other hand, avoiding the seek by using an SSD and increasing the I/O bandwidth we maybe see just how much those saved CPU cycles can push writes through faster to more heavily saturate the disk and get niiiice write throughput. More testing maybe, but does this align with what you see?
        Hide
        Jan Lehnardt added a comment - - edited

        Definitely more testing

        Randall, if you are comparing numbers from my tests note that the first was 300s the other 200s (IIRC).

        I think though your description is apt, Damien says on SSDs fsyncs are faster and cheaper, so we see the benefits of the patch earlier. On spinning media, the waiting-for-fsync trumps most of the time so the improvements in the now relatively smaller btree update code path are not as significant (for the small data set until we actually save seeks in the large data set)

        Show
        Jan Lehnardt added a comment - - edited Definitely more testing Randall, if you are comparing numbers from my tests note that the first was 300s the other 200s (IIRC). I think though your description is apt, Damien says on SSDs fsyncs are faster and cheaper, so we see the benefits of the patch earlier. On spinning media, the waiting-for-fsync trumps most of the time so the improvements in the now relatively smaller btree update code path are not as significant (for the small data set until we actually save seeks in the large data set)
        Hide
        Damien Katz added a comment -

        This isn't the first time i've seen weird unexpected results from relaxamation. I'm really thinking the benchmarks need some work to be more usable and useful. I can't explain wonder how the the improvements on the write path would cause this read impact. Perhaps more comprehensive tests would give a clearer picture what's going on, or maybe there is a bug in the tests themselves.

        I've also had a hard time interpreting the graphs, I think they need some smoothing or something to make it easier to visualize the differences.

        Show
        Damien Katz added a comment - This isn't the first time i've seen weird unexpected results from relaxamation. I'm really thinking the benchmarks need some work to be more usable and useful. I can't explain wonder how the the improvements on the write path would cause this read impact. Perhaps more comprehensive tests would give a clearer picture what's going on, or maybe there is a bug in the tests themselves. I've also had a hard time interpreting the graphs, I think they need some smoothing or something to make it easier to visualize the differences.
        Hide
        Randall Leeds added a comment -

        Relaximation uses a fixed number of writers and readers and fires off a new request whenever one finishes.
        That means that a decrease in write latency means more writes overall and if any resource was saturated before it likely means a commensurate impact elsewhere, since there's no free lunch.

        Jan, if you can find the time I think running the writes only test would be interesting. I would expect pure gains in all areas.
        From a code perspective we already know for sure we haven't made reads worse in general.

        Show
        Randall Leeds added a comment - Relaximation uses a fixed number of writers and readers and fires off a new request whenever one finishes. That means that a decrease in write latency means more writes overall and if any resource was saturated before it likely means a commensurate impact elsewhere, since there's no free lunch. Jan, if you can find the time I think running the writes only test would be interesting. I would expect pure gains in all areas. From a code perspective we already know for sure we haven't made reads worse in general.
        Hide
        Paul Joseph Davis added a comment -

        I'm finally getting around to landing the patch for the refactored couch_btree. This patch would seriously break this implementation. Anyone have thoughts on what should take priority?

        Show
        Paul Joseph Davis added a comment - I'm finally getting around to landing the patch for the refactored couch_btree. This patch would seriously break this implementation. Anyone have thoughts on what should take priority?
        Hide
        Randall Leeds added a comment -

        Paul: could you post a branch? I'd be thrilled to take a look at integrating this into your refactor, if it's still applicable.

        Show
        Randall Leeds added a comment - Paul: could you post a branch? I'd be thrilled to take a look at integrating this into your refactor, if it's still applicable.
        Hide
        Paul Joseph Davis added a comment -

        Updated the patch to apply after the patch on COUCHDB-1124 is applied (the refactor couch_btree.erl patch).

        This causes cookie_auth.js and replication.js to fail in Futon but I didn't bother checking the original version to see if its just mine or the patch itself. I'll take a closer look tomorrow.

        Show
        Paul Joseph Davis added a comment - Updated the patch to apply after the patch on COUCHDB-1124 is applied (the refactor couch_btree.erl patch). This causes cookie_auth.js and replication.js to fail in Futon but I didn't bother checking the original version to see if its just mine or the patch itself. I'll take a closer look tomorrow.

          People

          • Assignee:
            Damien Katz
            Reporter:
            Damien Katz
          • Votes:
            2 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

            • Created:
              Updated:

              Development