CouchDB
  1. CouchDB
  2. COUCHDB-1009

Make couch_stream buffer configurable

    Details

    • Type: Improvement Improvement
    • Status: Resolved
    • Priority: Trivial Trivial
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: None
    • Component/s: Database Core
    • Labels:
      None
    • Environment:

      trunk

      Description

      The couch_stream buffer is hardcoded to 4Kb.

      This value should be configurable. Larger values can improve write and specially read performance (if we write larger chunks to disk, we have higher chances of reading more contiguous disk blocks afterwards).

      I also think it's a good idea to change the default value from 4Kb to something higher (64Kb for e.g.).

      Patch attached

      1. COUCHDB-1009.patch
        4 kB
        Filipe Manana
      2. COUCHDB-1009-2.patch
        5 kB
        Filipe Manana
      3. COUCHDB-1009-rebased.patch
        5 kB
        Jan Lehnardt

        Activity

        Hide
        Benoit Chesneau added a comment -

        sound good for me.

        Show
        Benoit Chesneau added a comment - sound good for me.
        Hide
        Filipe Manana added a comment -

        Just by increasing the buffer size from 4Kb to 64Kb, for the same attachment upload, I got a reduction from 1545051 us to 1206335 us.

        It's not that much, but I believe this stresses less the file driver. Writing larger chunks is usually better than writing many small chunks (reduces number of context switches, better chances of having contiguous blocks allocated by the filesystem). Plus, after file:allocate/2 lands in OTP, it will be better to use it with even larger sizes (512Kb, 1Mb for example).

        For e.g., running the test function at https://github.com/fdmanana/couchdb/commit/6ee38e330aa5bff51d58470cba1096f01aa0dfbd#L2R39

        Erlang R14B01 (erts-5.8.2) [source] [smp:2:2] [rq:2] [async-threads:4] [hipe] [kernel-poll:true]

        Eshell V5.8.2 (abort with ^G)
        1> Apache CouchDB 1.2.0a13d6dd1-git (LogLevel=info) is starting.
        Apache CouchDB has started. Time to relax.
        [info] [<0.37.0>] Apache CouchDB has started on http://127.0.0.1:5984/

        1> couch_file:test(4096, 16).
        multi writes of 16 binaries, each of size 4096 bytes, took 870us
        batch write of 16 binaries, each of size 4096 bytes, took 377us
        ok
        2> couch_file:test(4096, 16).
        multi writes of 16 binaries, each of size 4096 bytes, took 1271us
        batch write of 16 binaries, each of size 4096 bytes, took 366us

        Show
        Filipe Manana added a comment - Just by increasing the buffer size from 4Kb to 64Kb, for the same attachment upload, I got a reduction from 1545051 us to 1206335 us. It's not that much, but I believe this stresses less the file driver. Writing larger chunks is usually better than writing many small chunks (reduces number of context switches, better chances of having contiguous blocks allocated by the filesystem). Plus, after file:allocate/2 lands in OTP, it will be better to use it with even larger sizes (512Kb, 1Mb for example). For e.g., running the test function at https://github.com/fdmanana/couchdb/commit/6ee38e330aa5bff51d58470cba1096f01aa0dfbd#L2R39 Erlang R14B01 (erts-5.8.2) [source] [smp:2:2] [rq:2] [async-threads:4] [hipe] [kernel-poll:true] Eshell V5.8.2 (abort with ^G) 1> Apache CouchDB 1.2.0a13d6dd1-git (LogLevel=info) is starting. Apache CouchDB has started. Time to relax. [info] [<0.37.0>] Apache CouchDB has started on http://127.0.0.1:5984/ 1> couch_ file:test(4096 , 16). multi writes of 16 binaries, each of size 4096 bytes, took 870us batch write of 16 binaries, each of size 4096 bytes, took 377us ok 2> couch_ file:test(4096 , 16). multi writes of 16 binaries, each of size 4096 bytes, took 1271us batch write of 16 binaries, each of size 4096 bytes, took 366us
        Hide
        Robert Newson added a comment -

        "it will be better to use it with even larger sizes (512Kb, 1Mb for example). "

        Except that this will block concurrent writers from making progress (I need you to transmit 1mb before I can transmit my 1mb), etc.

        Increasing from 4k seems a no-brainer to me though when I experimented with larger buffer sizes a while ago (a year?) I found that things were little improved, matching your numbers.

        I would be interested in knowing if it helps with compaction, though, since reading and writing larger attachment blocks should really help.

        Show
        Robert Newson added a comment - "it will be better to use it with even larger sizes (512Kb, 1Mb for example). " Except that this will block concurrent writers from making progress (I need you to transmit 1mb before I can transmit my 1mb), etc. Increasing from 4k seems a no-brainer to me though when I experimented with larger buffer sizes a while ago (a year?) I found that things were little improved, matching your numbers. I would be interested in knowing if it helps with compaction, though, since reading and writing larger attachment blocks should really help.
        Hide
        Filipe Manana added a comment -

        That's absolutely right Robert. But it might depend on the number of concurrent writers one expects, the size of the attachments and documents added/updated, etc. And that's another reason I think why it should be configurable and not hardcoded.

        Show
        Filipe Manana added a comment - That's absolutely right Robert. But it might depend on the number of concurrent writers one expects, the size of the attachments and documents added/updated, etc. And that's another reason I think why it should be configurable and not hardcoded.
        Hide
        Robert Newson added a comment -

        Agree it should be configurable, all I'm saying is that this setting has inherent limits. The doc for the setting should point out the down side and we can let people choose for themselves.

        Show
        Robert Newson added a comment - Agree it should be configurable, all I'm saying is that this setting has inherent limits. The doc for the setting should point out the down side and we can let people choose for themselves.
        Hide
        Adam Kocoloski added a comment -

        I would certainly not want to block the FD process for a network data transfer, but I think that's easy to avoid.

        Show
        Adam Kocoloski added a comment - I would certainly not want to block the FD process for a network data transfer, but I think that's easy to avoid.
        Hide
        Filipe Manana added a comment -

        Adam, do you think 64Kb is too much for most scenarios?
        I don't mind at all to leave this as 4Kb .

        Show
        Filipe Manana added a comment - Adam, do you think 64Kb is too much for most scenarios? I don't mind at all to leave this as 4Kb .
        Hide
        Robert Newson added a comment -

        for attachment heavy scenarios, perhaps resolving COUCHDB-769 would obviate any need to get this size 'just right'?

        a 28% improvement for a 16x increase in chunk size strikes me as weak, btw. When I tried this experiment before, I emitted the actual sizes of the chunks that were getting written (64kib simply being the maximum size). Have you verified that you actually write 64kib chunks rather than merely accept them?

        In any case, changing the default should be a separate commit to the commit that makes this setting configurable.

        Show
        Robert Newson added a comment - for attachment heavy scenarios, perhaps resolving COUCHDB-769 would obviate any need to get this size 'just right'? a 28% improvement for a 16x increase in chunk size strikes me as weak, btw. When I tried this experiment before, I emitted the actual sizes of the chunks that were getting written (64kib simply being the maximum size). Have you verified that you actually write 64kib chunks rather than merely accept them? In any case, changing the default should be a separate commit to the commit that makes this setting configurable.
        Hide
        Filipe Manana added a comment -

        COUCHDB-769 is a totally different matter I think, not to mention that it's a very big change, has many implications that the patch doesn't address and as far as I can tell, there aren't short or medium term plans to add it to CouchDB.

        And yes it writes 64Kb (or more) chunks to the file. couch_stream buffers the received data until it reaches or exceeds the buffer size.

        I have no objections about leaving it as 4Kb.

        Show
        Filipe Manana added a comment - COUCHDB-769 is a totally different matter I think, not to mention that it's a very big change, has many implications that the patch doesn't address and as far as I can tell, there aren't short or medium term plans to add it to CouchDB. And yes it writes 64Kb (or more) chunks to the file. couch_stream buffers the received data until it reaches or exceeds the buffer size. I have no objections about leaving it as 4Kb.
        Hide
        Filipe Manana added a comment -

        Robert,

        Do you agree with the .ini comment on this second patch?

        diff --git a/etc/couchdb/default.ini.tpl.in b/etc/couchdb/default.ini.tpl.in
        index 4c4d905..e1c1b4d 100644
        — a/etc/couchdb/default.ini.tpl.in
        +++ b/etc/couchdb/default.ini.tpl.in
        @@ -11,6 +11,10 @@ os_process_timeout = 5000 ; 5 seconds. for view and external servers.
        max_dbs_open = 100
        delayed_commits = true ; set this to false to ensure an fsync before 201 Created is returned
        uri_file = %localstaterundir%/couch.uri
        +; Higher values may give better read performance due to less read operations
        +; and/or more OS page cache hits, but they can also increase overall response
        +; time for writes when there are many write requests in parallel.
        +attachment_stream_buffer_size = 4096

        Show
        Filipe Manana added a comment - Robert, Do you agree with the .ini comment on this second patch? diff --git a/etc/couchdb/default.ini.tpl.in b/etc/couchdb/default.ini.tpl.in index 4c4d905..e1c1b4d 100644 — a/etc/couchdb/default.ini.tpl.in +++ b/etc/couchdb/default.ini.tpl.in @@ -11,6 +11,10 @@ os_process_timeout = 5000 ; 5 seconds. for view and external servers. max_dbs_open = 100 delayed_commits = true ; set this to false to ensure an fsync before 201 Created is returned uri_file = %localstaterundir%/couch.uri +; Higher values may give better read performance due to less read operations +; and/or more OS page cache hits, but they can also increase overall response +; time for writes when there are many write requests in parallel. +attachment_stream_buffer_size = 4096
        Hide
        Robert Newson added a comment -

        Good comment, yeah.

        My testing didn't show any compelling reason to increase this value, though, but at least this way it's not hardcoded.

        Show
        Robert Newson added a comment - Good comment, yeah. My testing didn't show any compelling reason to increase this value, though, but at least this way it's not hardcoded.
        Hide
        Jan Lehnardt added a comment -

        Rebased against master.

        @FIlipe, don't we need to listen to config changes and restart or reconfigure the gen_server or its state with the new config value?

        Show
        Jan Lehnardt added a comment - Rebased against master. @FIlipe, don't we need to listen to config changes and restart or reconfigure the gen_server or its state with the new config value?
        Hide
        Filipe Manana added a comment -

        Thanks for reminding about this Jan.

        Regarding your question, no it's not needed. For each attachment we write, we instantiate a couch_stream gen_server, which is then closed after writing the attachment. Therefore once the .ini parameter is updated, it only affects subsequent attachment writes. I think this is perfectly fine.

        Show
        Filipe Manana added a comment - Thanks for reminding about this Jan. Regarding your question, no it's not needed. For each attachment we write, we instantiate a couch_stream gen_server, which is then closed after writing the attachment. Therefore once the .ini parameter is updated, it only affects subsequent attachment writes. I think this is perfectly fine.
        Hide
        Filipe Manana added a comment -

        Applied to master.

        Show
        Filipe Manana added a comment - Applied to master.

          People

          • Assignee:
            Filipe Manana
            Reporter:
            Filipe Manana
          • Votes:
            0 Vote for this issue
            Watchers:
            0 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development