CouchDB
  1. CouchDB
  2. COUCHDB-1132

Track used space of database and view index files

    Details

    • Type: New Feature New Feature
    • Status: Resolved
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 1.2
    • Component/s: Database Core
    • Labels:
      None

      Description

      Currently users have no reliable way to know if a database or view index compaction is needed.

      Both me, Adam and Robert Dionne have been working on a feature to compute and expose the current data size (in bytes) of databases and view indexes. These computations are exposed as a single field in the database info and view index info URIs.

      Comparing this new value with the disk_size value (the total space in bytes used by the database or view index file) would allow users to decide whether or not it's worth to trigger a compaction.

      Adam and Robert's work can be found at:

      https://github.com/cloudant/bigcouch/compare/7d1adfa...a9410e6

      Mine can be found at:

      https://github.com/fdmanana/couchdb/compare/file_space

      After chatting with Adam on IRC, the main difference seems to be that they're work accounts only for user data (document bodies + attachments), while mine also accounts for the btree values (including all meta information, keys, rev trees, etc) and the data added by couch_file (4 bytes length prefix, md5s, block boundary markers).

      An example:

      $ curl http://localhost:5984/btree_db/_design/test/_info
      {"name":"test","view_index":{"signature":"aba9f066ed7f042f63d245ce0c7d870e","language":"javascript","disk_size":274556,"data_size":270455,"updater_running":false,"compact_running":false,"waiting_commit":false,"waiting_clients":0,"update_seq":1004,"purge_seq":0}}

      $ curl http://localhost:5984/btree_db

      {"db_name":"btree_db","doc_count":1004,"doc_del_count":0,"update_seq":1004,"purge_seq":0,"compact_running":false,"disk_size":6197361,"data_size":6186460,"instance_start_time":"1303231080936421","disk_format_version":5,"committed_update_seq":1004}

      This example was executed just after compacting the test database and view index. The new filed "data_size" has a value very close to the final file size.

      The only thing that my branch doesn't include in the data_size computation, for databases, are the size of the last header, the size of the _security object and purged revs list - in practice these are very small and insignificant that adding extra code to account them doesn't seem worth it.

      I'm sure we can merge the best from both branches.

      Adam, Robert, thoughts?

        Activity

        Hide
        Filipe Manana added a comment -

        Applied to trunk.
        Thanks everyone, specially Adam and Robert Dionne

        Show
        Filipe Manana added a comment - Applied to trunk. Thanks everyone, specially Adam and Robert Dionne
        Hide
        Filipe Manana added a comment -

        Ok, sounds good to me enough. And actually just did a small patch for it:

        http://friendpaste.com/3ACjKssyNXMhFju9irTdJg

        If objections, I'll create a ticket for it to avoid this one blocked.

        Show
        Filipe Manana added a comment - Ok, sounds good to me enough. And actually just did a small patch for it: http://friendpaste.com/3ACjKssyNXMhFju9irTdJg If objections, I'll create a ticket for it to avoid this one blocked.
        Hide
        Paul Joseph Davis added a comment -

        @Adam,

        I agree with your comment about using memory usage as the threshold so that people have a better understanding of what it is they're setting.

        Show
        Paul Joseph Davis added a comment - @Adam, I agree with your comment about using memory usage as the threshold so that people have a better understanding of what it is they're setting.
        Hide
        Paul Joseph Davis added a comment -

        @Jan

        That won't be doable until we make the b+tree balance itself during writes. But once/if we get around to that your request would happen more or less automatically with a few tweaks to the compactor.

        Show
        Paul Joseph Davis added a comment - @Jan That won't be doable until we make the b+tree balance itself during writes. But once/if we get around to that your request would happen more or less automatically with a few tweaks to the compactor.
        Hide
        Jan Lehnardt added a comment -

        @Adam I meant that it'd be nice if a user could look at disk_size and data_size and do the math on what a compaction could do. the more accurate the better, but I'm happy to settle for "too complicated".

        I like your take on configuration options.

        Show
        Jan Lehnardt added a comment - @Adam I meant that it'd be nice if a user could look at disk_size and data_size and do the math on what a compaction could do. the more accurate the better, but I'm happy to settle for "too complicated". I like your take on configuration options.
        Hide
        Adam Kocoloski added a comment -

        @janl Did you mean data_size = post_compaction_file_size? What you wrote doesn't make sense to me. And yes, I think it would be too complicated to try to do that.

        @fdmanana The view compactor uses a static batch size of 10000. The work queues are only involved during indexing. I put a patch somewhere to place a configurable minimum bound on the size of the batch written to disk during indexing, which does help reduce the file size.

        Regarding the config entry, I've started to think that every new config entry we add represents a problem we couldn't solve for the end user. If we need to have an entry, maybe we should use units that make more sense for the user, e.g. a threshold in bytes for the compactor process above which it flushes to disk. I'd be particularly in favor of such a threshold for the view compactor, since the the map values are loaded into memory simultaneously (as opposed to the document bodies, which are written to the new file one at a time regardless of batch size). Different view compactions can use wildly different amounts of memory depending on the average value size.

        Show
        Adam Kocoloski added a comment - @janl Did you mean data_size = post_compaction_file_size? What you wrote doesn't make sense to me. And yes, I think it would be too complicated to try to do that. @fdmanana The view compactor uses a static batch size of 10000. The work queues are only involved during indexing. I put a patch somewhere to place a configurable minimum bound on the size of the batch written to disk during indexing, which does help reduce the file size. Regarding the config entry, I've started to think that every new config entry we add represents a problem we couldn't solve for the end user. If we need to have an entry, maybe we should use units that make more sense for the user, e.g. a threshold in bytes for the compactor process above which it flushes to disk. I'd be particularly in favor of such a threshold for the view compactor, since the the map values are loaded into memory simultaneously (as opposed to the document bodies, which are written to the new file one at a time regardless of batch size). Different view compactions can use wildly different amounts of memory depending on the average value size.
        Hide
        Jan Lehnardt added a comment -

        I'm all for making the compactor smarter

        Great work Filipe!

        I wish we could accurately make this equation work file_size - data_size = post_compaction_file_size, but it seems overly complicated to try, it would "just" be a nice API behaviour, that isn't required for any of this. So yeah.

        Show
        Jan Lehnardt added a comment - I'm all for making the compactor smarter Great work Filipe! I wish we could accurately make this equation work file_size - data_size = post_compaction_file_size, but it seems overly complicated to try, it would "just" be a nice API behaviour, that isn't required for any of this. So yeah.
        Hide
        Filipe Manana added a comment - - edited

        I made a few tests with much larger databases, and here follow the results.

        • 75Mb database, 12 531 documents

        Before compaction:

        $ curl http://localhost:5985/testdb1

        {"db_name":"testdb1","doc_count":12531,"doc_del_count":0,"update_seq":12531,"purge_seq":0, "compact_running":false,"disk_size":77545585,"data_size":35483560, "instance_start_time":"1303288992990482","disk_format_version":5,"committed_update_seq":12531}

        After compaction:

        $ curl http://localhost:5985/testdb1

        {"db_name":"testdb1","doc_count":12531,"doc_del_count":0,"update_seq":12531,"purge_seq":0, "compact_running":false,"disk_size":41271409,"data_size":35453857,"instance_start_time":"1303288992990482", "disk_format_version":5,"committed_update_seq":12531}

        data size is about 86% of the file size

        • 1.8Gb database, 262 531 documents

        Before compaction:

        $ curl http://localhost:5985/testdb1

        {"db_name":"testdb1","doc_count":262531,"doc_del_count":0,"update_seq":262531,"purge_seq":0, "compact_running":false,"disk_size":1962610801,"data_size":744835248,"instance_start_time":"1303289719133306", "disk_format_version":5,"committed_update_seq":262531}

        After compaction:

        $ curl http://localhost:5985/testdb1

        {"db_name":"testdb1","doc_count":262531,"doc_del_count":0,"update_seq":262531,"purge_seq":0, "compact_running":false,"disk_size":1139642481,"data_size":744292081,"instance_start_time":"1303289719133306", "disk_format_version":5,"committed_update_seq":262531}

        data size is about 65% of the file size

        After changing compaction checkpoint frequency from 10 000 to 10 000 000 000
        and compacting the database again:

        $ curl http://localhost:5985/testdb1

        {"db_name":"testdb1","doc_count":262531,"doc_del_count":0,"update_seq":262531,"purge_seq":0, "compact_running":false,"disk_size":1139601521,"data_size":744292168,"instance_start_time":"1303296830183399", "disk_format_version":5,"committed_update_seq":262531}

        data size is still about 65% of the file size

        After changing compaction batch size from 1 000 to 100 000

        $ curl http://localhost:5985/testdb1

        {"db_name":"testdb1","doc_count":262531,"doc_del_count":0,"update_seq":262531,"purge_seq":0, "compact_running":false,"disk_size":776962161,"data_size":744307523,"instance_start_time":"1303297206958149", "disk_format_version":5,"committed_update_seq":262531}

        data size is now about 96% of the file size

        • 16Gb database, 3 341 491 documents

        (No data dize before compaction since it was a database created with trunk CouchDB)

        After compaction:

        $ curl http://localhost:5985/large_1_20

        {"db_name":"large_1_20","doc_count":3341491,"doc_del_count":0,"update_seq":3341491,"purge_seq":0, "compact_running":false,"disk_size":16318431354,"data_size":15069943338,"instance_start_time":"1303296570043058", "disk_format_version":5,"committed_update_seq":3341491}

        data size is about 92% of the file size - this compaction was done with the default checkpoint frequency and batch size

        This makes me think we should make the compaction checkpoint frequency and batch size configurable in the .ini (specially the batch size), since this can reduce significantly the final file size as well as make the compaction a bit faster.
        Anyone -1 on doing this?

        For view indexes, the batch size is controlled by the size of the work queues, but I believe Adam and/or Paul were thinking about making this configurable.

        Show
        Filipe Manana added a comment - - edited I made a few tests with much larger databases, and here follow the results. 75Mb database, 12 531 documents Before compaction: $ curl http://localhost:5985/testdb1 {"db_name":"testdb1","doc_count":12531,"doc_del_count":0,"update_seq":12531,"purge_seq":0, "compact_running":false,"disk_size":77545585,"data_size":35483560, "instance_start_time":"1303288992990482","disk_format_version":5,"committed_update_seq":12531} After compaction: $ curl http://localhost:5985/testdb1 {"db_name":"testdb1","doc_count":12531,"doc_del_count":0,"update_seq":12531,"purge_seq":0, "compact_running":false,"disk_size":41271409,"data_size":35453857,"instance_start_time":"1303288992990482", "disk_format_version":5,"committed_update_seq":12531} data size is about 86% of the file size 1.8Gb database, 262 531 documents Before compaction: $ curl http://localhost:5985/testdb1 {"db_name":"testdb1","doc_count":262531,"doc_del_count":0,"update_seq":262531,"purge_seq":0, "compact_running":false,"disk_size":1962610801,"data_size":744835248,"instance_start_time":"1303289719133306", "disk_format_version":5,"committed_update_seq":262531} After compaction: $ curl http://localhost:5985/testdb1 {"db_name":"testdb1","doc_count":262531,"doc_del_count":0,"update_seq":262531,"purge_seq":0, "compact_running":false,"disk_size":1139642481,"data_size":744292081,"instance_start_time":"1303289719133306", "disk_format_version":5,"committed_update_seq":262531} data size is about 65% of the file size After changing compaction checkpoint frequency from 10 000 to 10 000 000 000 and compacting the database again: $ curl http://localhost:5985/testdb1 {"db_name":"testdb1","doc_count":262531,"doc_del_count":0,"update_seq":262531,"purge_seq":0, "compact_running":false,"disk_size":1139601521,"data_size":744292168,"instance_start_time":"1303296830183399", "disk_format_version":5,"committed_update_seq":262531} data size is still about 65% of the file size After changing compaction batch size from 1 000 to 100 000 $ curl http://localhost:5985/testdb1 {"db_name":"testdb1","doc_count":262531,"doc_del_count":0,"update_seq":262531,"purge_seq":0, "compact_running":false,"disk_size":776962161,"data_size":744307523,"instance_start_time":"1303297206958149", "disk_format_version":5,"committed_update_seq":262531} data size is now about 96% of the file size 16Gb database, 3 341 491 documents (No data dize before compaction since it was a database created with trunk CouchDB) After compaction: $ curl http://localhost:5985/large_1_20 {"db_name":"large_1_20","doc_count":3341491,"doc_del_count":0,"update_seq":3341491,"purge_seq":0, "compact_running":false,"disk_size":16318431354,"data_size":15069943338,"instance_start_time":"1303296570043058", "disk_format_version":5,"committed_update_seq":3341491} data size is about 92% of the file size - this compaction was done with the default checkpoint frequency and batch size This makes me think we should make the compaction checkpoint frequency and batch size configurable in the .ini (specially the batch size), since this can reduce significantly the final file size as well as make the compaction a bit faster. Anyone -1 on doing this? For view indexes, the batch size is controlled by the size of the work queues, but I believe Adam and/or Paul were thinking about making this configurable.
        Hide
        Filipe Manana added a comment -

        Thanks Adam.

        Not an issue for the replicator in trunk, it only uses binaries.

        Show
        Filipe Manana added a comment - Thanks Adam. Not an issue for the replicator in trunk, it only uses binaries.
        Hide
        Adam Kocoloski added a comment -

        One other side comment - adding new fields to the top-level of db.info() breaks replication because of COUCHDB-1004. I believe that will be fixed in time for 1.0.3 and 1.1.0.

        Show
        Adam Kocoloski added a comment - One other side comment - adding new fields to the top-level of db.info() breaks replication because of COUCHDB-1004 . I believe that will be fixed in time for 1.0.3 and 1.1.0.
        Hide
        Adam Kocoloski added a comment -

        Thanks Filipe. I've only gotten a chance to look briefly at your work, but it seems very cleanly structured and well-organized. It looks like your implementation is going to be a bit more efficient because it doesn't require an additional term_to_binary call on document updates. It also has the nice property that data_size ~= disk_size after compaction.

        You're right that the work Bob and I did only included "user data" in the size computation. It intentionally excludes all of the indexes, MD5s, etc. that are needed for proper operation of CouchDB but cannot be controlled by the user.

        We did set the size of old documents and KV pairs to zero rather than reporting a null data_size pre-compaction.

        Show
        Adam Kocoloski added a comment - Thanks Filipe. I've only gotten a chance to look briefly at your work, but it seems very cleanly structured and well-organized. It looks like your implementation is going to be a bit more efficient because it doesn't require an additional term_to_binary call on document updates. It also has the nice property that data_size ~= disk_size after compaction. You're right that the work Bob and I did only included "user data" in the size computation. It intentionally excludes all of the indexes, MD5s, etc. that are needed for proper operation of CouchDB but cannot be controlled by the user. We did set the size of old documents and KV pairs to zero rather than reporting a null data_size pre-compaction.
        Hide
        Filipe Manana added a comment -

        Just forgot to mention one detail.

        For existing database and view index files, the data_size field is reported with a JSON null value.
        After compaction, the file is upgraded and the value will then be a number. One alternative I had, was to consider btree nodes or revs with a missing size (that is, with an old term format) as a 0 size, but in the end this would expose values that were to small and far from reality that could confuse users. Not sure sure which way is the most appropriate however.

        Show
        Filipe Manana added a comment - Just forgot to mention one detail. For existing database and view index files, the data_size field is reported with a JSON null value. After compaction, the file is upgraded and the value will then be a number. One alternative I had, was to consider btree nodes or revs with a missing size (that is, with an old term format) as a 0 size, but in the end this would expose values that were to small and far from reality that could confuse users. Not sure sure which way is the most appropriate however.

          People

          • Assignee:
            Unassigned
            Reporter:
            Filipe Manana
          • Votes:
            1 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development