Details

    • Type: Improvement
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: None
    • Component/s: Database Core
    • Labels:
      None

      Description

      The current chunkify has problems when reduce functions create large values in that it will produce chunks (ie, kp nodes) that contain a single key. In some pathological cases this can create long chains of nodes that never branch.

      The old chunkify would also try and create nodes with an even number of bytes in each chunk. Given that we don't re-use chunks it makes more sense to try and pack our chunks as close to the threshold as possible so that we're creating fewer branches in our tree.

        Activity

        Hide
        jira-bot ASF subversion and git services added a comment -

        Commit 8556adbb98e79a09ec254967ee6acf3bef8d1fb6 in couchdb-couch's branch refs/heads/COUCHDB-3298-improve-couch-btree-chunkify from Paul Joseph Davis
        [ https://git-wip-us.apache.org/repos/asf?p=couchdb-couch.git;h=8556adb ]

        Make couch_btree:chunkify/1 prefer fewer chunks

        This changes couch_btree:chunkify/1 to produce fewer larger chunks
        rather than creating chunks of even-ish size.

        COUCHDB-3298

        Show
        jira-bot ASF subversion and git services added a comment - Commit 8556adbb98e79a09ec254967ee6acf3bef8d1fb6 in couchdb-couch's branch refs/heads/ COUCHDB-3298 -improve-couch-btree-chunkify from Paul Joseph Davis [ https://git-wip-us.apache.org/repos/asf?p=couchdb-couch.git;h=8556adb ] Make couch_btree:chunkify/1 prefer fewer chunks This changes couch_btree:chunkify/1 to produce fewer larger chunks rather than creating chunks of even-ish size. COUCHDB-3298
        Hide
        jira-bot ASF subversion and git services added a comment -

        Commit ff9fb7112ee5250af01e1b38c8cfa9caed152ae7 in couchdb-couch's branch refs/heads/COUCHDB-3298-improve-couch-btree-chunkify from Paul Joseph Davis
        [ https://git-wip-us.apache.org/repos/asf?p=couchdb-couch.git;h=ff9fb71 ]

        Ensure multi-item chunks in couch_btree:chunkify/1

        If the last element of a chunk has a huge reduction it was possible to
        return a btree node that had a single key. This prevents the edge case
        by forcing it into the previous chunk. Without this we can end up with a
        case where a path in the tree can extend for many levels with only a
        single key in each node.

        COUCHDB-3298

        Show
        jira-bot ASF subversion and git services added a comment - Commit ff9fb7112ee5250af01e1b38c8cfa9caed152ae7 in couchdb-couch's branch refs/heads/ COUCHDB-3298 -improve-couch-btree-chunkify from Paul Joseph Davis [ https://git-wip-us.apache.org/repos/asf?p=couchdb-couch.git;h=ff9fb71 ] Ensure multi-item chunks in couch_btree:chunkify/1 If the last element of a chunk has a huge reduction it was possible to return a btree node that had a single key. This prevents the edge case by forcing it into the previous chunk. Without this we can end up with a case where a path in the tree can extend for many levels with only a single key in each node. COUCHDB-3298
        Hide
        jira-bot ASF subversion and git services added a comment -

        Commit 8556adbb98e79a09ec254967ee6acf3bef8d1fb6 in couchdb-couch's branch refs/heads/master from Paul Joseph Davis
        [ https://git-wip-us.apache.org/repos/asf?p=couchdb-couch.git;h=8556adb ]

        Make couch_btree:chunkify/1 prefer fewer chunks

        This changes couch_btree:chunkify/1 to produce fewer larger chunks
        rather than creating chunks of even-ish size.

        COUCHDB-3298

        Show
        jira-bot ASF subversion and git services added a comment - Commit 8556adbb98e79a09ec254967ee6acf3bef8d1fb6 in couchdb-couch's branch refs/heads/master from Paul Joseph Davis [ https://git-wip-us.apache.org/repos/asf?p=couchdb-couch.git;h=8556adb ] Make couch_btree:chunkify/1 prefer fewer chunks This changes couch_btree:chunkify/1 to produce fewer larger chunks rather than creating chunks of even-ish size. COUCHDB-3298
        Hide
        jira-bot ASF subversion and git services added a comment -

        Commit ff9fb7112ee5250af01e1b38c8cfa9caed152ae7 in couchdb-couch's branch refs/heads/master from Paul Joseph Davis
        [ https://git-wip-us.apache.org/repos/asf?p=couchdb-couch.git;h=ff9fb71 ]

        Ensure multi-item chunks in couch_btree:chunkify/1

        If the last element of a chunk has a huge reduction it was possible to
        return a btree node that had a single key. This prevents the edge case
        by forcing it into the previous chunk. Without this we can end up with a
        case where a path in the tree can extend for many levels with only a
        single key in each node.

        COUCHDB-3298

        Show
        jira-bot ASF subversion and git services added a comment - Commit ff9fb7112ee5250af01e1b38c8cfa9caed152ae7 in couchdb-couch's branch refs/heads/master from Paul Joseph Davis [ https://git-wip-us.apache.org/repos/asf?p=couchdb-couch.git;h=ff9fb71 ] Ensure multi-item chunks in couch_btree:chunkify/1 If the last element of a chunk has a huge reduction it was possible to return a btree node that had a single key. This prevents the edge case by forcing it into the previous chunk. Without this we can end up with a case where a path in the tree can extend for many levels with only a single key in each node. COUCHDB-3298
        Hide
        jira-bot ASF subversion and git services added a comment -

        Commit 0080f15e9bb287eeeaca8479c5a3a5d35306bbcf in couchdb-couch's branch refs/heads/master from Paul Joseph Davis
        [ https://git-wip-us.apache.org/repos/asf?p=couchdb-couch.git;h=0080f15 ]

        Merge branch 'COUCHDB-3298-improve-couch-btree-chunkify'

        Show
        jira-bot ASF subversion and git services added a comment - Commit 0080f15e9bb287eeeaca8479c5a3a5d35306bbcf in couchdb-couch's branch refs/heads/master from Paul Joseph Davis [ https://git-wip-us.apache.org/repos/asf?p=couchdb-couch.git;h=0080f15 ] Merge branch ' COUCHDB-3298 -improve-couch-btree-chunkify'
        Hide
        paul.joseph.davis Paul Joseph Davis added a comment -

        Merged.

        Show
        paul.joseph.davis Paul Joseph Davis added a comment - Merged.
        Hide
        jira-bot ASF subversion and git services added a comment -

        Commit e3b5f40d9130a6d347b02f31a74fd5300b41bc6e in couchdb's branch refs/heads/COUCHDB-3298-optimize-writing-kv-nodes from Paul Joseph Davis
        [ https://gitbox.apache.org/repos/asf?p=couchdb.git;h=e3b5f40 ]

        Opimize writing KV node append writes

        As it turns out, the original change in COUCHDB-3298 ends up hurting
        disk usage when a view emits large amounts of data (i.e., more than
        half of the btree chunk size). The cause for this is that instead of
        writing single element nodes it would instead prefer to write kv nodes
        with three elements. While normally we might prefer this in memory, it
        turns out that our append only storage this causes a significantly more
        amount of trash on disk.

        We can show this with a few trivial examples. Imagine we write KV's a
        through f. The two following patterns show the nodes as we write each
        new kv.

        Before 3298:

        []
        [a]
        [a, b]
        [a, b]', [c]
        [a, b]', [c, d]
        [a, b]', [c, d]', [e]
        [a, b]', [c, d]', [e, f]

        After 3298:

        []
        [a]
        [a, b]
        [a, b, c]
        [a, b]', [c, d]
        [a, b]', [c, d, e]
        [a, b]', [c, d]', [e, f]

        The thing to realize here is which of these nodes end up as garbage. In
        the first example we end up with [a], [a, b], [c], [c, d], and [e] nodes
        that have been orphaned. Where as in the second case we end up with
        [a], [a, b], [a, b, c], [c, d], [c, d, e] as nodes that have been
        orphaned. A quick aside, the reason that [a, b] and [c, d] are orphaned
        is due to how a btree update works. For instance, when adding c, we read
        [a, b] into memory, append c, and then during our node write we call
        chunkify which gives us back [a, b], [c] which leads us to writing [a,
        b] a second time.

        This patch changes the write function to realize when we're merely
        appending KVs and saves us this extra write and generation of garbage.
        Its node patterns look like such:

        []
        [a]
        [a, b]
        [a, b], [c]
        [a, b], [c, d]
        [a, b], [c, d], [e]
        [a, b], [c, d], [e, f]

        Which means we only end up generating [a], [c], and [e] as garbage (with
        respect to kv nodes, kp nodes retain their historical behavior).

        Show
        jira-bot ASF subversion and git services added a comment - Commit e3b5f40d9130a6d347b02f31a74fd5300b41bc6e in couchdb's branch refs/heads/ COUCHDB-3298 -optimize-writing-kv-nodes from Paul Joseph Davis [ https://gitbox.apache.org/repos/asf?p=couchdb.git;h=e3b5f40 ] Opimize writing KV node append writes As it turns out, the original change in COUCHDB-3298 ends up hurting disk usage when a view emits large amounts of data (i.e., more than half of the btree chunk size). The cause for this is that instead of writing single element nodes it would instead prefer to write kv nodes with three elements. While normally we might prefer this in memory, it turns out that our append only storage this causes a significantly more amount of trash on disk. We can show this with a few trivial examples. Imagine we write KV's a through f. The two following patterns show the nodes as we write each new kv. Before 3298: [] [a] [a, b] [a, b] ', [c] [a, b] ', [c, d] [a, b] ', [c, d] ', [e] [a, b] ', [c, d] ', [e, f] After 3298: [] [a] [a, b] [a, b, c] [a, b] ', [c, d] [a, b] ', [c, d, e] [a, b] ', [c, d] ', [e, f] The thing to realize here is which of these nodes end up as garbage. In the first example we end up with [a] , [a, b] , [c] , [c, d] , and [e] nodes that have been orphaned. Where as in the second case we end up with [a] , [a, b] , [a, b, c] , [c, d] , [c, d, e] as nodes that have been orphaned. A quick aside, the reason that [a, b] and [c, d] are orphaned is due to how a btree update works. For instance, when adding c, we read [a, b] into memory, append c, and then during our node write we call chunkify which gives us back [a, b] , [c] which leads us to writing [a, b] a second time. This patch changes the write function to realize when we're merely appending KVs and saves us this extra write and generation of garbage. Its node patterns look like such: [] [a] [a, b] [a, b] , [c] [a, b] , [c, d] [a, b] , [c, d] , [e] [a, b] , [c, d] , [e, f] Which means we only end up generating [a] , [c] , and [e] as garbage (with respect to kv nodes, kp nodes retain their historical behavior).
        Hide
        jira-bot ASF subversion and git services added a comment -

        Commit acaf23b12d5067b913a858d2a166a35b33188f96 in couchdb's branch refs/heads/COUCHDB-3298-optimize-writing-kv-nodes from Paul Joseph Davis
        [ https://gitbox.apache.org/repos/asf?p=couchdb.git;h=acaf23b ]

        Opimize writing KV node append writes

        As it turns out, the original change in COUCHDB-3298 ends up hurting
        disk usage when a view emits large amounts of data (i.e., more than
        half of the btree chunk size). The cause for this is that instead of
        writing single element nodes it would instead prefer to write kv nodes
        with three elements. While normally we might prefer this in memory, it
        turns out that our append only storage this causes a significantly more
        amount of trash on disk.

        We can show this with a few trivial examples. Imagine we write KV's a
        through f. The two following patterns show the nodes as we write each
        new kv.

        Before 3298:

        []
        [a]
        [a, b]
        [a, b]', [c]
        [a, b]', [c, d]
        [a, b]', [c, d]', [e]
        [a, b]', [c, d]', [e, f]

        After 3298:

        []
        [a]
        [a, b]
        [a, b, c]
        [a, b]', [c, d]
        [a, b]', [c, d, e]
        [a, b]', [c, d]', [e, f]

        The thing to realize here is which of these nodes end up as garbage. In
        the first example we end up with [a], [a, b], [c], [c, d], and [e] nodes
        that have been orphaned. Where as in the second case we end up with
        [a], [a, b], [a, b, c], [c, d], [c, d, e] as nodes that have been
        orphaned. A quick aside, the reason that [a, b] and [c, d] are orphaned
        is due to how a btree update works. For instance, when adding c, we read
        [a, b] into memory, append c, and then during our node write we call
        chunkify which gives us back [a, b], [c] which leads us to writing [a,
        b] a second time.

        This patch changes the write function to realize when we're merely
        appending KVs and saves us this extra write and generation of garbage.
        Its node patterns look like such:

        []
        [a]
        [a, b]
        [a, b], [c]
        [a, b], [c, d]
        [a, b], [c, d], [e]
        [a, b], [c, d], [e, f]

        Which means we only end up generating [a], [c], and [e] as garbage (with
        respect to kv nodes, kp nodes retain their historical behavior).

        Show
        jira-bot ASF subversion and git services added a comment - Commit acaf23b12d5067b913a858d2a166a35b33188f96 in couchdb's branch refs/heads/ COUCHDB-3298 -optimize-writing-kv-nodes from Paul Joseph Davis [ https://gitbox.apache.org/repos/asf?p=couchdb.git;h=acaf23b ] Opimize writing KV node append writes As it turns out, the original change in COUCHDB-3298 ends up hurting disk usage when a view emits large amounts of data (i.e., more than half of the btree chunk size). The cause for this is that instead of writing single element nodes it would instead prefer to write kv nodes with three elements. While normally we might prefer this in memory, it turns out that our append only storage this causes a significantly more amount of trash on disk. We can show this with a few trivial examples. Imagine we write KV's a through f. The two following patterns show the nodes as we write each new kv. Before 3298: [] [a] [a, b] [a, b] ', [c] [a, b] ', [c, d] [a, b] ', [c, d] ', [e] [a, b] ', [c, d] ', [e, f] After 3298: [] [a] [a, b] [a, b, c] [a, b] ', [c, d] [a, b] ', [c, d, e] [a, b] ', [c, d] ', [e, f] The thing to realize here is which of these nodes end up as garbage. In the first example we end up with [a] , [a, b] , [c] , [c, d] , and [e] nodes that have been orphaned. Where as in the second case we end up with [a] , [a, b] , [a, b, c] , [c, d] , [c, d, e] as nodes that have been orphaned. A quick aside, the reason that [a, b] and [c, d] are orphaned is due to how a btree update works. For instance, when adding c, we read [a, b] into memory, append c, and then during our node write we call chunkify which gives us back [a, b] , [c] which leads us to writing [a, b] a second time. This patch changes the write function to realize when we're merely appending KVs and saves us this extra write and generation of garbage. Its node patterns look like such: [] [a] [a, b] [a, b] , [c] [a, b] , [c, d] [a, b] , [c, d] , [e] [a, b] , [c, d] , [e, f] Which means we only end up generating [a] , [c] , and [e] as garbage (with respect to kv nodes, kp nodes retain their historical behavior).
        Hide
        jira-bot ASF subversion and git services added a comment -

        Commit 6b2ed293dee8e077f418f50bc0d0f24aba69c760 in couchdb's branch refs/heads/COUCHDB-3298-optimize-writing-kv-nodes from Paul Joseph Davis
        [ https://gitbox.apache.org/repos/asf?p=couchdb.git;h=6b2ed29 ]

        Opimize writing KV node append writes

        As it turns out, the original change in COUCHDB-3298 ends up hurting
        disk usage when a view emits large amounts of data (i.e., more than
        half of the btree chunk size). The cause for this is that instead of
        writing single element nodes it would instead prefer to write kv nodes
        with three elements. While normally we might prefer this in memory, it
        turns out that our append only storage this causes a significantly more
        amount of trash on disk.

        We can show this with a few trivial examples. Imagine we write KV's a
        through f. The two following patterns show the nodes as we write each
        new kv.

        Before 3298:

        []
        [a]
        [a, b]
        [a, b]', [c]
        [a, b]', [c, d]
        [a, b]', [c, d]', [e]
        [a, b]', [c, d]', [e, f]

        After 3298:

        []
        [a]
        [a, b]
        [a, b, c]
        [a, b]', [c, d]
        [a, b]', [c, d, e]
        [a, b]', [c, d]', [e, f]

        The thing to realize here is which of these nodes end up as garbage. In
        the first example we end up with [a], [a, b], [c], [c, d], and [e] nodes
        that have been orphaned. Where as in the second case we end up with
        [a], [a, b], [a, b, c], [c, d], [c, d, e] as nodes that have been
        orphaned. A quick aside, the reason that [a, b] and [c, d] are orphaned
        is due to how a btree update works. For instance, when adding c, we read
        [a, b] into memory, append c, and then during our node write we call
        chunkify which gives us back [a, b], [c] which leads us to writing [a,
        b] a second time.

        This patch changes the write function to realize when we're merely
        appending KVs and saves us this extra write and generation of garbage.
        Its node patterns look like such:

        []
        [a]
        [a, b]
        [a, b], [c]
        [a, b], [c, d]
        [a, b], [c, d], [e]
        [a, b], [c, d], [e, f]

        Which means we only end up generating [a], [c], and [e] as garbage (with
        respect to kv nodes, kp nodes retain their historical behavior).

        Show
        jira-bot ASF subversion and git services added a comment - Commit 6b2ed293dee8e077f418f50bc0d0f24aba69c760 in couchdb's branch refs/heads/ COUCHDB-3298 -optimize-writing-kv-nodes from Paul Joseph Davis [ https://gitbox.apache.org/repos/asf?p=couchdb.git;h=6b2ed29 ] Opimize writing KV node append writes As it turns out, the original change in COUCHDB-3298 ends up hurting disk usage when a view emits large amounts of data (i.e., more than half of the btree chunk size). The cause for this is that instead of writing single element nodes it would instead prefer to write kv nodes with three elements. While normally we might prefer this in memory, it turns out that our append only storage this causes a significantly more amount of trash on disk. We can show this with a few trivial examples. Imagine we write KV's a through f. The two following patterns show the nodes as we write each new kv. Before 3298: [] [a] [a, b] [a, b] ', [c] [a, b] ', [c, d] [a, b] ', [c, d] ', [e] [a, b] ', [c, d] ', [e, f] After 3298: [] [a] [a, b] [a, b, c] [a, b] ', [c, d] [a, b] ', [c, d, e] [a, b] ', [c, d] ', [e, f] The thing to realize here is which of these nodes end up as garbage. In the first example we end up with [a] , [a, b] , [c] , [c, d] , and [e] nodes that have been orphaned. Where as in the second case we end up with [a] , [a, b] , [a, b, c] , [c, d] , [c, d, e] as nodes that have been orphaned. A quick aside, the reason that [a, b] and [c, d] are orphaned is due to how a btree update works. For instance, when adding c, we read [a, b] into memory, append c, and then during our node write we call chunkify which gives us back [a, b] , [c] which leads us to writing [a, b] a second time. This patch changes the write function to realize when we're merely appending KVs and saves us this extra write and generation of garbage. Its node patterns look like such: [] [a] [a, b] [a, b] , [c] [a, b] , [c, d] [a, b] , [c, d] , [e] [a, b] , [c, d] , [e, f] Which means we only end up generating [a] , [c] , and [e] as garbage (with respect to kv nodes, kp nodes retain their historical behavior).
        Hide
        jira-bot ASF subversion and git services added a comment -

        Commit de622c6c2c86d693df8a3c68351b803db5e04ff6 in couchdb's branch refs/heads/COUCHDB-3298-optimize-writing-kv-nodes from Paul Joseph Davis
        [ https://gitbox.apache.org/repos/asf?p=couchdb.git;h=de622c6 ]

        Opimize writing KV node append writes

        As it turns out, the original change in COUCHDB-3298 ends up hurting
        disk usage when a view emits large amounts of data (i.e., more than
        half of the btree chunk size). The cause for this is that instead of
        writing single element nodes it would instead prefer to write kv nodes
        with three elements. While normally we might prefer this in memory, it
        turns out that our append only storage this causes a significantly more
        amount of trash on disk.

        We can show this with a few trivial examples. Imagine we write KV's a
        through f. The two following patterns show the nodes as we write each
        new kv.

        Before 3298:

        []
        [a]
        [a, b]
        [a, b]', [c]
        [a, b]', [c, d]
        [a, b]', [c, d]', [e]
        [a, b]', [c, d]', [e, f]

        After 3298:

        []
        [a]
        [a, b]
        [a, b, c]
        [a, b]', [c, d]
        [a, b]', [c, d, e]
        [a, b]', [c, d]', [e, f]

        The thing to realize here is which of these nodes end up as garbage. In
        the first example we end up with [a], [a, b], [c], [c, d], and [e] nodes
        that have been orphaned. Where as in the second case we end up with
        [a], [a, b], [a, b, c], [c, d], [c, d, e] as nodes that have been
        orphaned. A quick aside, the reason that [a, b] and [c, d] are orphaned
        is due to how a btree update works. For instance, when adding c, we read
        [a, b] into memory, append c, and then during our node write we call
        chunkify which gives us back [a, b], [c] which leads us to writing [a,
        b] a second time.

        This patch changes the write function to realize when we're merely
        appending KVs and saves us this extra write and generation of garbage.
        Its node patterns look like such:

        []
        [a]
        [a, b]
        [a, b], [c]
        [a, b], [c, d]
        [a, b], [c, d], [e]
        [a, b], [c, d], [e, f]

        Which means we only end up generating [a], [c], and [e] as garbage (with
        respect to kv nodes, kp nodes retain their historical behavior).

        Show
        jira-bot ASF subversion and git services added a comment - Commit de622c6c2c86d693df8a3c68351b803db5e04ff6 in couchdb's branch refs/heads/ COUCHDB-3298 -optimize-writing-kv-nodes from Paul Joseph Davis [ https://gitbox.apache.org/repos/asf?p=couchdb.git;h=de622c6 ] Opimize writing KV node append writes As it turns out, the original change in COUCHDB-3298 ends up hurting disk usage when a view emits large amounts of data (i.e., more than half of the btree chunk size). The cause for this is that instead of writing single element nodes it would instead prefer to write kv nodes with three elements. While normally we might prefer this in memory, it turns out that our append only storage this causes a significantly more amount of trash on disk. We can show this with a few trivial examples. Imagine we write KV's a through f. The two following patterns show the nodes as we write each new kv. Before 3298: [] [a] [a, b] [a, b] ', [c] [a, b] ', [c, d] [a, b] ', [c, d] ', [e] [a, b] ', [c, d] ', [e, f] After 3298: [] [a] [a, b] [a, b, c] [a, b] ', [c, d] [a, b] ', [c, d, e] [a, b] ', [c, d] ', [e, f] The thing to realize here is which of these nodes end up as garbage. In the first example we end up with [a] , [a, b] , [c] , [c, d] , and [e] nodes that have been orphaned. Where as in the second case we end up with [a] , [a, b] , [a, b, c] , [c, d] , [c, d, e] as nodes that have been orphaned. A quick aside, the reason that [a, b] and [c, d] are orphaned is due to how a btree update works. For instance, when adding c, we read [a, b] into memory, append c, and then during our node write we call chunkify which gives us back [a, b] , [c] which leads us to writing [a, b] a second time. This patch changes the write function to realize when we're merely appending KVs and saves us this extra write and generation of garbage. Its node patterns look like such: [] [a] [a, b] [a, b] , [c] [a, b] , [c, d] [a, b] , [c, d] , [e] [a, b] , [c, d] , [e, f] Which means we only end up generating [a] , [c] , and [e] as garbage (with respect to kv nodes, kp nodes retain their historical behavior).
        Hide
        jira-bot ASF subversion and git services added a comment -

        Commit 4c5a4f77c885c6893fb7e0f2123c0d5aef9a1473 in couchdb's branch refs/heads/COUCHDB-3298-optimize-writing-btree-nodes from Paul Joseph Davis
        [ https://gitbox.apache.org/repos/asf?p=couchdb.git;h=4c5a4f7 ]

        Opimize writing KV node append writes

        As it turns out, the original change in COUCHDB-3298 ends up hurting
        disk usage when a view emits large amounts of data (i.e., more than
        half of the btree chunk size). The cause for this is that instead of
        writing single element nodes it would instead prefer to write kv nodes
        with three elements. While normally we might prefer this in memory, it
        turns out that our append only storage this causes a significantly more
        amount of trash on disk.

        We can show this with a few trivial examples. Imagine we write KV's a
        through f. The two following patterns show the nodes as we write each
        new kv.

        Before 3298:

        []
        [a]
        [a, b]
        [a, b]', [c]
        [a, b]', [c, d]
        [a, b]', [c, d]', [e]
        [a, b]', [c, d]', [e, f]

        After 3298:

        []
        [a]
        [a, b]
        [a, b, c]
        [a, b]', [c, d]
        [a, b]', [c, d, e]
        [a, b]', [c, d]', [e, f]

        The thing to realize here is which of these nodes end up as garbage. In
        the first example we end up with [a], [a, b], [c], [c, d], and [e] nodes
        that have been orphaned. Where as in the second case we end up with
        [a], [a, b], [a, b, c], [c, d], [c, d, e] as nodes that have been
        orphaned. A quick aside, the reason that [a, b] and [c, d] are orphaned
        is due to how a btree update works. For instance, when adding c, we read
        [a, b] into memory, append c, and then during our node write we call
        chunkify which gives us back [a, b], [c] which leads us to writing [a,
        b] a second time.

        The main benefit of this patch is to realize when its possible to reuse
        a node that already exists on disk. It achieves this by looking at the
        list of key/values when writing new nodes and comparing it to the old
        list of key/values for the node read from disk. By checking to see if
        the old list exists unchanged in the new list we can just reuse the old
        node. Node reuse is limited to when the old node is larger than 50% of
        the chunk threshold to maintain the B+Tree properties.

        The disk usage improvements this gives can also be quite dramatic. In
        the case above when we have ordered keys with large values (> 50% of the
        btree chunk size) we find upwards of 50% less disk usage. Random keys
        also benefit as well though to a lesser extent depending on disk size
        (as they will often be in the middle of an existing node which prevents
        our optimization).

        Show
        jira-bot ASF subversion and git services added a comment - Commit 4c5a4f77c885c6893fb7e0f2123c0d5aef9a1473 in couchdb's branch refs/heads/ COUCHDB-3298 -optimize-writing-btree-nodes from Paul Joseph Davis [ https://gitbox.apache.org/repos/asf?p=couchdb.git;h=4c5a4f7 ] Opimize writing KV node append writes As it turns out, the original change in COUCHDB-3298 ends up hurting disk usage when a view emits large amounts of data (i.e., more than half of the btree chunk size). The cause for this is that instead of writing single element nodes it would instead prefer to write kv nodes with three elements. While normally we might prefer this in memory, it turns out that our append only storage this causes a significantly more amount of trash on disk. We can show this with a few trivial examples. Imagine we write KV's a through f. The two following patterns show the nodes as we write each new kv. Before 3298: [] [a] [a, b] [a, b] ', [c] [a, b] ', [c, d] [a, b] ', [c, d] ', [e] [a, b] ', [c, d] ', [e, f] After 3298: [] [a] [a, b] [a, b, c] [a, b] ', [c, d] [a, b] ', [c, d, e] [a, b] ', [c, d] ', [e, f] The thing to realize here is which of these nodes end up as garbage. In the first example we end up with [a] , [a, b] , [c] , [c, d] , and [e] nodes that have been orphaned. Where as in the second case we end up with [a] , [a, b] , [a, b, c] , [c, d] , [c, d, e] as nodes that have been orphaned. A quick aside, the reason that [a, b] and [c, d] are orphaned is due to how a btree update works. For instance, when adding c, we read [a, b] into memory, append c, and then during our node write we call chunkify which gives us back [a, b] , [c] which leads us to writing [a, b] a second time. The main benefit of this patch is to realize when its possible to reuse a node that already exists on disk. It achieves this by looking at the list of key/values when writing new nodes and comparing it to the old list of key/values for the node read from disk. By checking to see if the old list exists unchanged in the new list we can just reuse the old node. Node reuse is limited to when the old node is larger than 50% of the chunk threshold to maintain the B+Tree properties. The disk usage improvements this gives can also be quite dramatic. In the case above when we have ordered keys with large values (> 50% of the btree chunk size) we find upwards of 50% less disk usage. Random keys also benefit as well though to a lesser extent depending on disk size (as they will often be in the middle of an existing node which prevents our optimization).
        Hide
        jira-bot ASF subversion and git services added a comment -

        Commit 69d4821a1dcf5331a0e8a9a9c645e9403f2b427c in couchdb's branch refs/heads/COUCHDB-3298-optimize-writing-btree-nodes from Paul Joseph Davis
        [ https://gitbox.apache.org/repos/asf?p=couchdb.git;h=69d4821 ]

        Opimize writing KV node append writes

        As it turns out, the original change in COUCHDB-3298 ends up hurting
        disk usage when a view emits large amounts of data (i.e., more than
        half of the btree chunk size). The cause for this is that instead of
        writing single element nodes it would instead prefer to write kv nodes
        with three elements. While normally we might prefer this in memory, it
        turns out that our append only storage this causes a significantly more
        amount of trash on disk.

        We can show this with a few trivial examples. Imagine we write KV's a
        through f. The two following patterns show the nodes as we write each
        new kv.

        Before 3298:

        []
        [a]
        [a, b]
        [a, b]', [c]
        [a, b]', [c, d]
        [a, b]', [c, d]', [e]
        [a, b]', [c, d]', [e, f]

        After 3298:

        []
        [a]
        [a, b]
        [a, b, c]
        [a, b]', [c, d]
        [a, b]', [c, d, e]
        [a, b]', [c, d]', [e, f]

        The thing to realize here is which of these nodes end up as garbage. In
        the first example we end up with [a], [a, b], [c], [c, d], and [e] nodes
        that have been orphaned. Where as in the second case we end up with
        [a], [a, b], [a, b, c], [c, d], [c, d, e] as nodes that have been
        orphaned. A quick aside, the reason that [a, b] and [c, d] are orphaned
        is due to how a btree update works. For instance, when adding c, we read
        [a, b] into memory, append c, and then during our node write we call
        chunkify which gives us back [a, b], [c] which leads us to writing [a,
        b] a second time.

        The main benefit of this patch is to realize when its possible to reuse
        a node that already exists on disk. It achieves this by looking at the
        list of key/values when writing new nodes and comparing it to the old
        list of key/values for the node read from disk. By checking to see if
        the old list exists unchanged in the new list we can just reuse the old
        node. Node reuse is limited to when the old node is larger than 50% of
        the chunk threshold to maintain the B+Tree properties.

        The disk usage improvements this gives can also be quite dramatic. In
        the case above when we have ordered keys with large values (> 50% of the
        btree chunk size) we find upwards of 50% less disk usage. Random keys
        also benefit as well though to a lesser extent depending on disk size
        (as they will often be in the middle of an existing node which prevents
        our optimization).

        COUCHDB-3298

        Show
        jira-bot ASF subversion and git services added a comment - Commit 69d4821a1dcf5331a0e8a9a9c645e9403f2b427c in couchdb's branch refs/heads/ COUCHDB-3298 -optimize-writing-btree-nodes from Paul Joseph Davis [ https://gitbox.apache.org/repos/asf?p=couchdb.git;h=69d4821 ] Opimize writing KV node append writes As it turns out, the original change in COUCHDB-3298 ends up hurting disk usage when a view emits large amounts of data (i.e., more than half of the btree chunk size). The cause for this is that instead of writing single element nodes it would instead prefer to write kv nodes with three elements. While normally we might prefer this in memory, it turns out that our append only storage this causes a significantly more amount of trash on disk. We can show this with a few trivial examples. Imagine we write KV's a through f. The two following patterns show the nodes as we write each new kv. Before 3298: [] [a] [a, b] [a, b] ', [c] [a, b] ', [c, d] [a, b] ', [c, d] ', [e] [a, b] ', [c, d] ', [e, f] After 3298: [] [a] [a, b] [a, b, c] [a, b] ', [c, d] [a, b] ', [c, d, e] [a, b] ', [c, d] ', [e, f] The thing to realize here is which of these nodes end up as garbage. In the first example we end up with [a] , [a, b] , [c] , [c, d] , and [e] nodes that have been orphaned. Where as in the second case we end up with [a] , [a, b] , [a, b, c] , [c, d] , [c, d, e] as nodes that have been orphaned. A quick aside, the reason that [a, b] and [c, d] are orphaned is due to how a btree update works. For instance, when adding c, we read [a, b] into memory, append c, and then during our node write we call chunkify which gives us back [a, b] , [c] which leads us to writing [a, b] a second time. The main benefit of this patch is to realize when its possible to reuse a node that already exists on disk. It achieves this by looking at the list of key/values when writing new nodes and comparing it to the old list of key/values for the node read from disk. By checking to see if the old list exists unchanged in the new list we can just reuse the old node. Node reuse is limited to when the old node is larger than 50% of the chunk threshold to maintain the B+Tree properties. The disk usage improvements this gives can also be quite dramatic. In the case above when we have ordered keys with large values (> 50% of the btree chunk size) we find upwards of 50% less disk usage. Random keys also benefit as well though to a lesser extent depending on disk size (as they will often be in the middle of an existing node which prevents our optimization). COUCHDB-3298
        Hide
        jira-bot ASF subversion and git services added a comment -

        Commit 69d4821a1dcf5331a0e8a9a9c645e9403f2b427c in couchdb's branch refs/heads/COUCHDB-3298-optimize-writing-btree-nodes from Paul Joseph Davis
        [ https://gitbox.apache.org/repos/asf?p=couchdb.git;h=69d4821 ]

        Opimize writing KV node append writes

        As it turns out, the original change in COUCHDB-3298 ends up hurting
        disk usage when a view emits large amounts of data (i.e., more than
        half of the btree chunk size). The cause for this is that instead of
        writing single element nodes it would instead prefer to write kv nodes
        with three elements. While normally we might prefer this in memory, it
        turns out that our append only storage this causes a significantly more
        amount of trash on disk.

        We can show this with a few trivial examples. Imagine we write KV's a
        through f. The two following patterns show the nodes as we write each
        new kv.

        Before 3298:

        []
        [a]
        [a, b]
        [a, b]', [c]
        [a, b]', [c, d]
        [a, b]', [c, d]', [e]
        [a, b]', [c, d]', [e, f]

        After 3298:

        []
        [a]
        [a, b]
        [a, b, c]
        [a, b]', [c, d]
        [a, b]', [c, d, e]
        [a, b]', [c, d]', [e, f]

        The thing to realize here is which of these nodes end up as garbage. In
        the first example we end up with [a], [a, b], [c], [c, d], and [e] nodes
        that have been orphaned. Where as in the second case we end up with
        [a], [a, b], [a, b, c], [c, d], [c, d, e] as nodes that have been
        orphaned. A quick aside, the reason that [a, b] and [c, d] are orphaned
        is due to how a btree update works. For instance, when adding c, we read
        [a, b] into memory, append c, and then during our node write we call
        chunkify which gives us back [a, b], [c] which leads us to writing [a,
        b] a second time.

        The main benefit of this patch is to realize when its possible to reuse
        a node that already exists on disk. It achieves this by looking at the
        list of key/values when writing new nodes and comparing it to the old
        list of key/values for the node read from disk. By checking to see if
        the old list exists unchanged in the new list we can just reuse the old
        node. Node reuse is limited to when the old node is larger than 50% of
        the chunk threshold to maintain the B+Tree properties.

        The disk usage improvements this gives can also be quite dramatic. In
        the case above when we have ordered keys with large values (> 50% of the
        btree chunk size) we find upwards of 50% less disk usage. Random keys
        also benefit as well though to a lesser extent depending on disk size
        (as they will often be in the middle of an existing node which prevents
        our optimization).

        COUCHDB-3298

        Show
        jira-bot ASF subversion and git services added a comment - Commit 69d4821a1dcf5331a0e8a9a9c645e9403f2b427c in couchdb's branch refs/heads/ COUCHDB-3298 -optimize-writing-btree-nodes from Paul Joseph Davis [ https://gitbox.apache.org/repos/asf?p=couchdb.git;h=69d4821 ] Opimize writing KV node append writes As it turns out, the original change in COUCHDB-3298 ends up hurting disk usage when a view emits large amounts of data (i.e., more than half of the btree chunk size). The cause for this is that instead of writing single element nodes it would instead prefer to write kv nodes with three elements. While normally we might prefer this in memory, it turns out that our append only storage this causes a significantly more amount of trash on disk. We can show this with a few trivial examples. Imagine we write KV's a through f. The two following patterns show the nodes as we write each new kv. Before 3298: [] [a] [a, b] [a, b] ', [c] [a, b] ', [c, d] [a, b] ', [c, d] ', [e] [a, b] ', [c, d] ', [e, f] After 3298: [] [a] [a, b] [a, b, c] [a, b] ', [c, d] [a, b] ', [c, d, e] [a, b] ', [c, d] ', [e, f] The thing to realize here is which of these nodes end up as garbage. In the first example we end up with [a] , [a, b] , [c] , [c, d] , and [e] nodes that have been orphaned. Where as in the second case we end up with [a] , [a, b] , [a, b, c] , [c, d] , [c, d, e] as nodes that have been orphaned. A quick aside, the reason that [a, b] and [c, d] are orphaned is due to how a btree update works. For instance, when adding c, we read [a, b] into memory, append c, and then during our node write we call chunkify which gives us back [a, b] , [c] which leads us to writing [a, b] a second time. The main benefit of this patch is to realize when its possible to reuse a node that already exists on disk. It achieves this by looking at the list of key/values when writing new nodes and comparing it to the old list of key/values for the node read from disk. By checking to see if the old list exists unchanged in the new list we can just reuse the old node. Node reuse is limited to when the old node is larger than 50% of the chunk threshold to maintain the B+Tree properties. The disk usage improvements this gives can also be quite dramatic. In the case above when we have ordered keys with large values (> 50% of the btree chunk size) we find upwards of 50% less disk usage. Random keys also benefit as well though to a lesser extent depending on disk size (as they will often be in the middle of an existing node which prevents our optimization). COUCHDB-3298
        Hide
        jira-bot ASF subversion and git services added a comment -

        Commit fa38bd89c78c13a017e0625df252437ac5ebab74 in couchdb's branch refs/heads/COUCHDB-3298-optimize-writing-btree-nodes from Paul Joseph Davis
        [ https://gitbox.apache.org/repos/asf?p=couchdb.git;h=fa38bd8 ]

        Opimize writing btree nodes

        As it turns out, the original change in COUCHDB-3298 ends up hurting
        disk usage when a view emits large amounts of data (i.e., more than
        half of the btree chunk size). The cause for this is that instead of
        writing single element nodes it would instead prefer to write kv nodes
        with three elements. While normally we might prefer this in memory, it
        turns out that our append only storage this causes a significantly more
        amount of trash on disk.

        We can show this with a few trivial examples. Imagine we write KV's a
        through f. The two following patterns show the nodes as we write each
        new kv.

        Before 3298:

        []
        [a]
        [a, b]
        [a, b]', [c]
        [a, b]', [c, d]
        [a, b]', [c, d]', [e]
        [a, b]', [c, d]', [e, f]

        After 3298:

        []
        [a]
        [a, b]
        [a, b, c]
        [a, b]', [c, d]
        [a, b]', [c, d, e]
        [a, b]', [c, d]', [e, f]

        The thing to realize here is which of these nodes end up as garbage. In
        the first example we end up with [a], [a, b], [c], [c, d], and [e] nodes
        that have been orphaned. Where as in the second case we end up with
        [a], [a, b], [a, b, c], [c, d], [c, d, e] as nodes that have been
        orphaned. A quick aside, the reason that [a, b] and [c, d] are orphaned
        is due to how a btree update works. For instance, when adding c, we read
        [a, b] into memory, append c, and then during our node write we call
        chunkify which gives us back [a, b], [c] which leads us to writing [a,
        b] a second time.

        The main benefit of this patch is to realize when its possible to reuse
        a node that already exists on disk. It achieves this by looking at the
        list of key/values when writing new nodes and comparing it to the old
        list of key/values for the node read from disk. By checking to see if
        the old list exists unchanged in the new list we can just reuse the old
        node. Node reuse is limited to when the old node is larger than 50% of
        the chunk threshold to maintain the B+Tree properties.

        The disk usage improvements this gives can also be quite dramatic. In
        the case above when we have ordered keys with large values (> 50% of the
        btree chunk size) we find upwards of 50% less disk usage. Random keys
        also benefit as well though to a lesser extent depending on disk size
        (as they will often be in the middle of an existing node which prevents
        our optimization).

        COUCHDB-3298

        Show
        jira-bot ASF subversion and git services added a comment - Commit fa38bd89c78c13a017e0625df252437ac5ebab74 in couchdb's branch refs/heads/ COUCHDB-3298 -optimize-writing-btree-nodes from Paul Joseph Davis [ https://gitbox.apache.org/repos/asf?p=couchdb.git;h=fa38bd8 ] Opimize writing btree nodes As it turns out, the original change in COUCHDB-3298 ends up hurting disk usage when a view emits large amounts of data (i.e., more than half of the btree chunk size). The cause for this is that instead of writing single element nodes it would instead prefer to write kv nodes with three elements. While normally we might prefer this in memory, it turns out that our append only storage this causes a significantly more amount of trash on disk. We can show this with a few trivial examples. Imagine we write KV's a through f. The two following patterns show the nodes as we write each new kv. Before 3298: [] [a] [a, b] [a, b] ', [c] [a, b] ', [c, d] [a, b] ', [c, d] ', [e] [a, b] ', [c, d] ', [e, f] After 3298: [] [a] [a, b] [a, b, c] [a, b] ', [c, d] [a, b] ', [c, d, e] [a, b] ', [c, d] ', [e, f] The thing to realize here is which of these nodes end up as garbage. In the first example we end up with [a] , [a, b] , [c] , [c, d] , and [e] nodes that have been orphaned. Where as in the second case we end up with [a] , [a, b] , [a, b, c] , [c, d] , [c, d, e] as nodes that have been orphaned. A quick aside, the reason that [a, b] and [c, d] are orphaned is due to how a btree update works. For instance, when adding c, we read [a, b] into memory, append c, and then during our node write we call chunkify which gives us back [a, b] , [c] which leads us to writing [a, b] a second time. The main benefit of this patch is to realize when its possible to reuse a node that already exists on disk. It achieves this by looking at the list of key/values when writing new nodes and comparing it to the old list of key/values for the node read from disk. By checking to see if the old list exists unchanged in the new list we can just reuse the old node. Node reuse is limited to when the old node is larger than 50% of the chunk threshold to maintain the B+Tree properties. The disk usage improvements this gives can also be quite dramatic. In the case above when we have ordered keys with large values (> 50% of the btree chunk size) we find upwards of 50% less disk usage. Random keys also benefit as well though to a lesser extent depending on disk size (as they will often be in the middle of an existing node which prevents our optimization). COUCHDB-3298
        Hide
        jira-bot ASF subversion and git services added a comment -

        Commit fa38bd89c78c13a017e0625df252437ac5ebab74 in couchdb's branch refs/heads/COUCHDB-3298-optimize-writing-btree-nodes from Paul Joseph Davis
        [ https://gitbox.apache.org/repos/asf?p=couchdb.git;h=fa38bd8 ]

        Opimize writing btree nodes

        As it turns out, the original change in COUCHDB-3298 ends up hurting
        disk usage when a view emits large amounts of data (i.e., more than
        half of the btree chunk size). The cause for this is that instead of
        writing single element nodes it would instead prefer to write kv nodes
        with three elements. While normally we might prefer this in memory, it
        turns out that our append only storage this causes a significantly more
        amount of trash on disk.

        We can show this with a few trivial examples. Imagine we write KV's a
        through f. The two following patterns show the nodes as we write each
        new kv.

        Before 3298:

        []
        [a]
        [a, b]
        [a, b]', [c]
        [a, b]', [c, d]
        [a, b]', [c, d]', [e]
        [a, b]', [c, d]', [e, f]

        After 3298:

        []
        [a]
        [a, b]
        [a, b, c]
        [a, b]', [c, d]
        [a, b]', [c, d, e]
        [a, b]', [c, d]', [e, f]

        The thing to realize here is which of these nodes end up as garbage. In
        the first example we end up with [a], [a, b], [c], [c, d], and [e] nodes
        that have been orphaned. Where as in the second case we end up with
        [a], [a, b], [a, b, c], [c, d], [c, d, e] as nodes that have been
        orphaned. A quick aside, the reason that [a, b] and [c, d] are orphaned
        is due to how a btree update works. For instance, when adding c, we read
        [a, b] into memory, append c, and then during our node write we call
        chunkify which gives us back [a, b], [c] which leads us to writing [a,
        b] a second time.

        The main benefit of this patch is to realize when its possible to reuse
        a node that already exists on disk. It achieves this by looking at the
        list of key/values when writing new nodes and comparing it to the old
        list of key/values for the node read from disk. By checking to see if
        the old list exists unchanged in the new list we can just reuse the old
        node. Node reuse is limited to when the old node is larger than 50% of
        the chunk threshold to maintain the B+Tree properties.

        The disk usage improvements this gives can also be quite dramatic. In
        the case above when we have ordered keys with large values (> 50% of the
        btree chunk size) we find upwards of 50% less disk usage. Random keys
        also benefit as well though to a lesser extent depending on disk size
        (as they will often be in the middle of an existing node which prevents
        our optimization).

        COUCHDB-3298

        Show
        jira-bot ASF subversion and git services added a comment - Commit fa38bd89c78c13a017e0625df252437ac5ebab74 in couchdb's branch refs/heads/ COUCHDB-3298 -optimize-writing-btree-nodes from Paul Joseph Davis [ https://gitbox.apache.org/repos/asf?p=couchdb.git;h=fa38bd8 ] Opimize writing btree nodes As it turns out, the original change in COUCHDB-3298 ends up hurting disk usage when a view emits large amounts of data (i.e., more than half of the btree chunk size). The cause for this is that instead of writing single element nodes it would instead prefer to write kv nodes with three elements. While normally we might prefer this in memory, it turns out that our append only storage this causes a significantly more amount of trash on disk. We can show this with a few trivial examples. Imagine we write KV's a through f. The two following patterns show the nodes as we write each new kv. Before 3298: [] [a] [a, b] [a, b] ', [c] [a, b] ', [c, d] [a, b] ', [c, d] ', [e] [a, b] ', [c, d] ', [e, f] After 3298: [] [a] [a, b] [a, b, c] [a, b] ', [c, d] [a, b] ', [c, d, e] [a, b] ', [c, d] ', [e, f] The thing to realize here is which of these nodes end up as garbage. In the first example we end up with [a] , [a, b] , [c] , [c, d] , and [e] nodes that have been orphaned. Where as in the second case we end up with [a] , [a, b] , [a, b, c] , [c, d] , [c, d, e] as nodes that have been orphaned. A quick aside, the reason that [a, b] and [c, d] are orphaned is due to how a btree update works. For instance, when adding c, we read [a, b] into memory, append c, and then during our node write we call chunkify which gives us back [a, b] , [c] which leads us to writing [a, b] a second time. The main benefit of this patch is to realize when its possible to reuse a node that already exists on disk. It achieves this by looking at the list of key/values when writing new nodes and comparing it to the old list of key/values for the node read from disk. By checking to see if the old list exists unchanged in the new list we can just reuse the old node. Node reuse is limited to when the old node is larger than 50% of the chunk threshold to maintain the B+Tree properties. The disk usage improvements this gives can also be quite dramatic. In the case above when we have ordered keys with large values (> 50% of the btree chunk size) we find upwards of 50% less disk usage. Random keys also benefit as well though to a lesser extent depending on disk size (as they will often be in the middle of an existing node which prevents our optimization). COUCHDB-3298
        Hide
        jira-bot ASF subversion and git services added a comment -

        Commit 4296ec0604bad0db3d82eee690512f87a792a76b in couchdb's branch refs/heads/COUCHDB-3298-optimize-writing-btree-nodes from Paul Joseph Davis
        [ https://gitbox.apache.org/repos/asf?p=couchdb.git;h=4296ec0 ]

        Opimize writing btree nodes

        As it turns out, the original change in COUCHDB-3298 ends up hurting
        disk usage when a view emits large amounts of data (i.e., more than
        half of the btree chunk size). The cause for this is that instead of
        writing single element nodes it would instead prefer to write kv nodes
        with three elements. While normally we might prefer this in memory, it
        turns out that our append only storage this causes a significantly more
        amount of trash on disk.

        We can show this with a few trivial examples. Imagine we write KV's a
        through f. The two following patterns show the nodes as we write each
        new kv.

        Before 3298:

        []
        [a]
        [a, b]
        [a, b]', [c]
        [a, b]', [c, d]
        [a, b]', [c, d]', [e]
        [a, b]', [c, d]', [e, f]

        After 3298:

        []
        [a]
        [a, b]
        [a, b, c]
        [a, b]', [c, d]
        [a, b]', [c, d, e]
        [a, b]', [c, d]', [e, f]

        The thing to realize here is which of these nodes end up as garbage. In
        the first example we end up with [a], [a, b], [c], [c, d], and [e] nodes
        that have been orphaned. Where as in the second case we end up with
        [a], [a, b], [a, b, c], [c, d], [c, d, e] as nodes that have been
        orphaned. A quick aside, the reason that [a, b] and [c, d] are orphaned
        is due to how a btree update works. For instance, when adding c, we read
        [a, b] into memory, append c, and then during our node write we call
        chunkify which gives us back [a, b], [c] which leads us to writing [a,
        b] a second time.

        The main benefit of this patch is to realize when its possible to reuse
        a node that already exists on disk. It achieves this by looking at the
        list of key/values when writing new nodes and comparing it to the old
        list of key/values for the node read from disk. By checking to see if
        the old list exists unchanged in the new list we can just reuse the old
        node. Node reuse is limited to when the old node is larger than 50% of
        the chunk threshold to maintain the B+Tree properties.

        The disk usage improvements this gives can also be quite dramatic. In
        the case above when we have ordered keys with large values (> 50% of the
        btree chunk size) we find upwards of 50% less disk usage. Random keys
        also benefit as well though to a lesser extent depending on disk size
        (as they will often be in the middle of an existing node which prevents
        our optimization).

        COUCHDB-3298

        Show
        jira-bot ASF subversion and git services added a comment - Commit 4296ec0604bad0db3d82eee690512f87a792a76b in couchdb's branch refs/heads/ COUCHDB-3298 -optimize-writing-btree-nodes from Paul Joseph Davis [ https://gitbox.apache.org/repos/asf?p=couchdb.git;h=4296ec0 ] Opimize writing btree nodes As it turns out, the original change in COUCHDB-3298 ends up hurting disk usage when a view emits large amounts of data (i.e., more than half of the btree chunk size). The cause for this is that instead of writing single element nodes it would instead prefer to write kv nodes with three elements. While normally we might prefer this in memory, it turns out that our append only storage this causes a significantly more amount of trash on disk. We can show this with a few trivial examples. Imagine we write KV's a through f. The two following patterns show the nodes as we write each new kv. Before 3298: [] [a] [a, b] [a, b] ', [c] [a, b] ', [c, d] [a, b] ', [c, d] ', [e] [a, b] ', [c, d] ', [e, f] After 3298: [] [a] [a, b] [a, b, c] [a, b] ', [c, d] [a, b] ', [c, d, e] [a, b] ', [c, d] ', [e, f] The thing to realize here is which of these nodes end up as garbage. In the first example we end up with [a] , [a, b] , [c] , [c, d] , and [e] nodes that have been orphaned. Where as in the second case we end up with [a] , [a, b] , [a, b, c] , [c, d] , [c, d, e] as nodes that have been orphaned. A quick aside, the reason that [a, b] and [c, d] are orphaned is due to how a btree update works. For instance, when adding c, we read [a, b] into memory, append c, and then during our node write we call chunkify which gives us back [a, b] , [c] which leads us to writing [a, b] a second time. The main benefit of this patch is to realize when its possible to reuse a node that already exists on disk. It achieves this by looking at the list of key/values when writing new nodes and comparing it to the old list of key/values for the node read from disk. By checking to see if the old list exists unchanged in the new list we can just reuse the old node. Node reuse is limited to when the old node is larger than 50% of the chunk threshold to maintain the B+Tree properties. The disk usage improvements this gives can also be quite dramatic. In the case above when we have ordered keys with large values (> 50% of the btree chunk size) we find upwards of 50% less disk usage. Random keys also benefit as well though to a lesser extent depending on disk size (as they will often be in the middle of an existing node which prevents our optimization). COUCHDB-3298
        Hide
        jira-bot ASF subversion and git services added a comment -

        Commit 4296ec0604bad0db3d82eee690512f87a792a76b in couchdb's branch refs/heads/COUCHDB-3298-optimize-writing-btree-nodes from Paul Joseph Davis
        [ https://gitbox.apache.org/repos/asf?p=couchdb.git;h=4296ec0 ]

        Opimize writing btree nodes

        As it turns out, the original change in COUCHDB-3298 ends up hurting
        disk usage when a view emits large amounts of data (i.e., more than
        half of the btree chunk size). The cause for this is that instead of
        writing single element nodes it would instead prefer to write kv nodes
        with three elements. While normally we might prefer this in memory, it
        turns out that our append only storage this causes a significantly more
        amount of trash on disk.

        We can show this with a few trivial examples. Imagine we write KV's a
        through f. The two following patterns show the nodes as we write each
        new kv.

        Before 3298:

        []
        [a]
        [a, b]
        [a, b]', [c]
        [a, b]', [c, d]
        [a, b]', [c, d]', [e]
        [a, b]', [c, d]', [e, f]

        After 3298:

        []
        [a]
        [a, b]
        [a, b, c]
        [a, b]', [c, d]
        [a, b]', [c, d, e]
        [a, b]', [c, d]', [e, f]

        The thing to realize here is which of these nodes end up as garbage. In
        the first example we end up with [a], [a, b], [c], [c, d], and [e] nodes
        that have been orphaned. Where as in the second case we end up with
        [a], [a, b], [a, b, c], [c, d], [c, d, e] as nodes that have been
        orphaned. A quick aside, the reason that [a, b] and [c, d] are orphaned
        is due to how a btree update works. For instance, when adding c, we read
        [a, b] into memory, append c, and then during our node write we call
        chunkify which gives us back [a, b], [c] which leads us to writing [a,
        b] a second time.

        The main benefit of this patch is to realize when its possible to reuse
        a node that already exists on disk. It achieves this by looking at the
        list of key/values when writing new nodes and comparing it to the old
        list of key/values for the node read from disk. By checking to see if
        the old list exists unchanged in the new list we can just reuse the old
        node. Node reuse is limited to when the old node is larger than 50% of
        the chunk threshold to maintain the B+Tree properties.

        The disk usage improvements this gives can also be quite dramatic. In
        the case above when we have ordered keys with large values (> 50% of the
        btree chunk size) we find upwards of 50% less disk usage. Random keys
        also benefit as well though to a lesser extent depending on disk size
        (as they will often be in the middle of an existing node which prevents
        our optimization).

        COUCHDB-3298

        Show
        jira-bot ASF subversion and git services added a comment - Commit 4296ec0604bad0db3d82eee690512f87a792a76b in couchdb's branch refs/heads/ COUCHDB-3298 -optimize-writing-btree-nodes from Paul Joseph Davis [ https://gitbox.apache.org/repos/asf?p=couchdb.git;h=4296ec0 ] Opimize writing btree nodes As it turns out, the original change in COUCHDB-3298 ends up hurting disk usage when a view emits large amounts of data (i.e., more than half of the btree chunk size). The cause for this is that instead of writing single element nodes it would instead prefer to write kv nodes with three elements. While normally we might prefer this in memory, it turns out that our append only storage this causes a significantly more amount of trash on disk. We can show this with a few trivial examples. Imagine we write KV's a through f. The two following patterns show the nodes as we write each new kv. Before 3298: [] [a] [a, b] [a, b] ', [c] [a, b] ', [c, d] [a, b] ', [c, d] ', [e] [a, b] ', [c, d] ', [e, f] After 3298: [] [a] [a, b] [a, b, c] [a, b] ', [c, d] [a, b] ', [c, d, e] [a, b] ', [c, d] ', [e, f] The thing to realize here is which of these nodes end up as garbage. In the first example we end up with [a] , [a, b] , [c] , [c, d] , and [e] nodes that have been orphaned. Where as in the second case we end up with [a] , [a, b] , [a, b, c] , [c, d] , [c, d, e] as nodes that have been orphaned. A quick aside, the reason that [a, b] and [c, d] are orphaned is due to how a btree update works. For instance, when adding c, we read [a, b] into memory, append c, and then during our node write we call chunkify which gives us back [a, b] , [c] which leads us to writing [a, b] a second time. The main benefit of this patch is to realize when its possible to reuse a node that already exists on disk. It achieves this by looking at the list of key/values when writing new nodes and comparing it to the old list of key/values for the node read from disk. By checking to see if the old list exists unchanged in the new list we can just reuse the old node. Node reuse is limited to when the old node is larger than 50% of the chunk threshold to maintain the B+Tree properties. The disk usage improvements this gives can also be quite dramatic. In the case above when we have ordered keys with large values (> 50% of the btree chunk size) we find upwards of 50% less disk usage. Random keys also benefit as well though to a lesser extent depending on disk size (as they will often be in the middle of an existing node which prevents our optimization). COUCHDB-3298
        Hide
        jira-bot ASF subversion and git services added a comment -

        Commit 07fa508228d6b32b190de06e1b00be08fe8db672 in couchdb's branch refs/heads/master from Paul Joseph Davis
        [ https://gitbox.apache.org/repos/asf?p=couchdb.git;h=07fa508 ]

        Opimize writing btree nodes

        As it turns out, the original change in COUCHDB-3298 ends up hurting
        disk usage when a view emits large amounts of data (i.e., more than
        half of the btree chunk size). The cause for this is that instead of
        writing single element nodes it would instead prefer to write kv nodes
        with three elements. While normally we might prefer this in memory, it
        turns out that our append only storage this causes a significantly more
        amount of trash on disk.

        We can show this with a few trivial examples. Imagine we write KV's a
        through f. The two following patterns show the nodes as we write each
        new kv.

        Before 3298:

        []
        [a]
        [a, b]
        [a, b]', [c]
        [a, b]', [c, d]
        [a, b]', [c, d]', [e]
        [a, b]', [c, d]', [e, f]

        After 3298:

        []
        [a]
        [a, b]
        [a, b, c]
        [a, b]', [c, d]
        [a, b]', [c, d, e]
        [a, b]', [c, d]', [e, f]

        The thing to realize here is which of these nodes end up as garbage. In
        the first example we end up with [a], [a, b], [c], [c, d], and [e] nodes
        that have been orphaned. Where as in the second case we end up with
        [a], [a, b], [a, b, c], [c, d], [c, d, e] as nodes that have been
        orphaned. A quick aside, the reason that [a, b] and [c, d] are orphaned
        is due to how a btree update works. For instance, when adding c, we read
        [a, b] into memory, append c, and then during our node write we call
        chunkify which gives us back [a, b], [c] which leads us to writing [a,
        b] a second time.

        The main benefit of this patch is to realize when its possible to reuse
        a node that already exists on disk. It achieves this by looking at the
        list of key/values when writing new nodes and comparing it to the old
        list of key/values for the node read from disk. By checking to see if
        the old list exists unchanged in the new list we can just reuse the old
        node. Node reuse is limited to when the old node is larger than 50% of
        the chunk threshold to maintain the B+Tree properties.

        The disk usage improvements this gives can also be quite dramatic. In
        the case above when we have ordered keys with large values (> 50% of the
        btree chunk size) we find upwards of 50% less disk usage. Random keys
        also benefit as well though to a lesser extent depending on disk size
        (as they will often be in the middle of an existing node which prevents
        our optimization).

        COUCHDB-3298

        Show
        jira-bot ASF subversion and git services added a comment - Commit 07fa508228d6b32b190de06e1b00be08fe8db672 in couchdb's branch refs/heads/master from Paul Joseph Davis [ https://gitbox.apache.org/repos/asf?p=couchdb.git;h=07fa508 ] Opimize writing btree nodes As it turns out, the original change in COUCHDB-3298 ends up hurting disk usage when a view emits large amounts of data (i.e., more than half of the btree chunk size). The cause for this is that instead of writing single element nodes it would instead prefer to write kv nodes with three elements. While normally we might prefer this in memory, it turns out that our append only storage this causes a significantly more amount of trash on disk. We can show this with a few trivial examples. Imagine we write KV's a through f. The two following patterns show the nodes as we write each new kv. Before 3298: [] [a] [a, b] [a, b] ', [c] [a, b] ', [c, d] [a, b] ', [c, d] ', [e] [a, b] ', [c, d] ', [e, f] After 3298: [] [a] [a, b] [a, b, c] [a, b] ', [c, d] [a, b] ', [c, d, e] [a, b] ', [c, d] ', [e, f] The thing to realize here is which of these nodes end up as garbage. In the first example we end up with [a] , [a, b] , [c] , [c, d] , and [e] nodes that have been orphaned. Where as in the second case we end up with [a] , [a, b] , [a, b, c] , [c, d] , [c, d, e] as nodes that have been orphaned. A quick aside, the reason that [a, b] and [c, d] are orphaned is due to how a btree update works. For instance, when adding c, we read [a, b] into memory, append c, and then during our node write we call chunkify which gives us back [a, b] , [c] which leads us to writing [a, b] a second time. The main benefit of this patch is to realize when its possible to reuse a node that already exists on disk. It achieves this by looking at the list of key/values when writing new nodes and comparing it to the old list of key/values for the node read from disk. By checking to see if the old list exists unchanged in the new list we can just reuse the old node. Node reuse is limited to when the old node is larger than 50% of the chunk threshold to maintain the B+Tree properties. The disk usage improvements this gives can also be quite dramatic. In the case above when we have ordered keys with large values (> 50% of the btree chunk size) we find upwards of 50% less disk usage. Random keys also benefit as well though to a lesser extent depending on disk size (as they will often be in the middle of an existing node which prevents our optimization). COUCHDB-3298
        Hide
        jira-bot ASF subversion and git services added a comment -

        Commit 07fa508228d6b32b190de06e1b00be08fe8db672 in couchdb's branch refs/heads/master from Paul Joseph Davis
        [ https://gitbox.apache.org/repos/asf?p=couchdb.git;h=07fa508 ]

        Opimize writing btree nodes

        As it turns out, the original change in COUCHDB-3298 ends up hurting
        disk usage when a view emits large amounts of data (i.e., more than
        half of the btree chunk size). The cause for this is that instead of
        writing single element nodes it would instead prefer to write kv nodes
        with three elements. While normally we might prefer this in memory, it
        turns out that our append only storage this causes a significantly more
        amount of trash on disk.

        We can show this with a few trivial examples. Imagine we write KV's a
        through f. The two following patterns show the nodes as we write each
        new kv.

        Before 3298:

        []
        [a]
        [a, b]
        [a, b]', [c]
        [a, b]', [c, d]
        [a, b]', [c, d]', [e]
        [a, b]', [c, d]', [e, f]

        After 3298:

        []
        [a]
        [a, b]
        [a, b, c]
        [a, b]', [c, d]
        [a, b]', [c, d, e]
        [a, b]', [c, d]', [e, f]

        The thing to realize here is which of these nodes end up as garbage. In
        the first example we end up with [a], [a, b], [c], [c, d], and [e] nodes
        that have been orphaned. Where as in the second case we end up with
        [a], [a, b], [a, b, c], [c, d], [c, d, e] as nodes that have been
        orphaned. A quick aside, the reason that [a, b] and [c, d] are orphaned
        is due to how a btree update works. For instance, when adding c, we read
        [a, b] into memory, append c, and then during our node write we call
        chunkify which gives us back [a, b], [c] which leads us to writing [a,
        b] a second time.

        The main benefit of this patch is to realize when its possible to reuse
        a node that already exists on disk. It achieves this by looking at the
        list of key/values when writing new nodes and comparing it to the old
        list of key/values for the node read from disk. By checking to see if
        the old list exists unchanged in the new list we can just reuse the old
        node. Node reuse is limited to when the old node is larger than 50% of
        the chunk threshold to maintain the B+Tree properties.

        The disk usage improvements this gives can also be quite dramatic. In
        the case above when we have ordered keys with large values (> 50% of the
        btree chunk size) we find upwards of 50% less disk usage. Random keys
        also benefit as well though to a lesser extent depending on disk size
        (as they will often be in the middle of an existing node which prevents
        our optimization).

        COUCHDB-3298

        Show
        jira-bot ASF subversion and git services added a comment - Commit 07fa508228d6b32b190de06e1b00be08fe8db672 in couchdb's branch refs/heads/master from Paul Joseph Davis [ https://gitbox.apache.org/repos/asf?p=couchdb.git;h=07fa508 ] Opimize writing btree nodes As it turns out, the original change in COUCHDB-3298 ends up hurting disk usage when a view emits large amounts of data (i.e., more than half of the btree chunk size). The cause for this is that instead of writing single element nodes it would instead prefer to write kv nodes with three elements. While normally we might prefer this in memory, it turns out that our append only storage this causes a significantly more amount of trash on disk. We can show this with a few trivial examples. Imagine we write KV's a through f. The two following patterns show the nodes as we write each new kv. Before 3298: [] [a] [a, b] [a, b] ', [c] [a, b] ', [c, d] [a, b] ', [c, d] ', [e] [a, b] ', [c, d] ', [e, f] After 3298: [] [a] [a, b] [a, b, c] [a, b] ', [c, d] [a, b] ', [c, d, e] [a, b] ', [c, d] ', [e, f] The thing to realize here is which of these nodes end up as garbage. In the first example we end up with [a] , [a, b] , [c] , [c, d] , and [e] nodes that have been orphaned. Where as in the second case we end up with [a] , [a, b] , [a, b, c] , [c, d] , [c, d, e] as nodes that have been orphaned. A quick aside, the reason that [a, b] and [c, d] are orphaned is due to how a btree update works. For instance, when adding c, we read [a, b] into memory, append c, and then during our node write we call chunkify which gives us back [a, b] , [c] which leads us to writing [a, b] a second time. The main benefit of this patch is to realize when its possible to reuse a node that already exists on disk. It achieves this by looking at the list of key/values when writing new nodes and comparing it to the old list of key/values for the node read from disk. By checking to see if the old list exists unchanged in the new list we can just reuse the old node. Node reuse is limited to when the old node is larger than 50% of the chunk threshold to maintain the B+Tree properties. The disk usage improvements this gives can also be quite dramatic. In the case above when we have ordered keys with large values (> 50% of the btree chunk size) we find upwards of 50% less disk usage. Random keys also benefit as well though to a lesser extent depending on disk size (as they will often be in the middle of an existing node which prevents our optimization). COUCHDB-3298
        Hide
        jira-bot ASF subversion and git services added a comment -

        Commit 07fa508228d6b32b190de06e1b00be08fe8db672 in couchdb's branch refs/heads/jenkins-pipeline from Paul Joseph Davis
        [ https://gitbox.apache.org/repos/asf?p=couchdb.git;h=07fa508 ]

        Opimize writing btree nodes

        As it turns out, the original change in COUCHDB-3298 ends up hurting
        disk usage when a view emits large amounts of data (i.e., more than
        half of the btree chunk size). The cause for this is that instead of
        writing single element nodes it would instead prefer to write kv nodes
        with three elements. While normally we might prefer this in memory, it
        turns out that our append only storage this causes a significantly more
        amount of trash on disk.

        We can show this with a few trivial examples. Imagine we write KV's a
        through f. The two following patterns show the nodes as we write each
        new kv.

        Before 3298:

        []
        [a]
        [a, b]
        [a, b]', [c]
        [a, b]', [c, d]
        [a, b]', [c, d]', [e]
        [a, b]', [c, d]', [e, f]

        After 3298:

        []
        [a]
        [a, b]
        [a, b, c]
        [a, b]', [c, d]
        [a, b]', [c, d, e]
        [a, b]', [c, d]', [e, f]

        The thing to realize here is which of these nodes end up as garbage. In
        the first example we end up with [a], [a, b], [c], [c, d], and [e] nodes
        that have been orphaned. Where as in the second case we end up with
        [a], [a, b], [a, b, c], [c, d], [c, d, e] as nodes that have been
        orphaned. A quick aside, the reason that [a, b] and [c, d] are orphaned
        is due to how a btree update works. For instance, when adding c, we read
        [a, b] into memory, append c, and then during our node write we call
        chunkify which gives us back [a, b], [c] which leads us to writing [a,
        b] a second time.

        The main benefit of this patch is to realize when its possible to reuse
        a node that already exists on disk. It achieves this by looking at the
        list of key/values when writing new nodes and comparing it to the old
        list of key/values for the node read from disk. By checking to see if
        the old list exists unchanged in the new list we can just reuse the old
        node. Node reuse is limited to when the old node is larger than 50% of
        the chunk threshold to maintain the B+Tree properties.

        The disk usage improvements this gives can also be quite dramatic. In
        the case above when we have ordered keys with large values (> 50% of the
        btree chunk size) we find upwards of 50% less disk usage. Random keys
        also benefit as well though to a lesser extent depending on disk size
        (as they will often be in the middle of an existing node which prevents
        our optimization).

        COUCHDB-3298

        Show
        jira-bot ASF subversion and git services added a comment - Commit 07fa508228d6b32b190de06e1b00be08fe8db672 in couchdb's branch refs/heads/jenkins-pipeline from Paul Joseph Davis [ https://gitbox.apache.org/repos/asf?p=couchdb.git;h=07fa508 ] Opimize writing btree nodes As it turns out, the original change in COUCHDB-3298 ends up hurting disk usage when a view emits large amounts of data (i.e., more than half of the btree chunk size). The cause for this is that instead of writing single element nodes it would instead prefer to write kv nodes with three elements. While normally we might prefer this in memory, it turns out that our append only storage this causes a significantly more amount of trash on disk. We can show this with a few trivial examples. Imagine we write KV's a through f. The two following patterns show the nodes as we write each new kv. Before 3298: [] [a] [a, b] [a, b] ', [c] [a, b] ', [c, d] [a, b] ', [c, d] ', [e] [a, b] ', [c, d] ', [e, f] After 3298: [] [a] [a, b] [a, b, c] [a, b] ', [c, d] [a, b] ', [c, d, e] [a, b] ', [c, d] ', [e, f] The thing to realize here is which of these nodes end up as garbage. In the first example we end up with [a] , [a, b] , [c] , [c, d] , and [e] nodes that have been orphaned. Where as in the second case we end up with [a] , [a, b] , [a, b, c] , [c, d] , [c, d, e] as nodes that have been orphaned. A quick aside, the reason that [a, b] and [c, d] are orphaned is due to how a btree update works. For instance, when adding c, we read [a, b] into memory, append c, and then during our node write we call chunkify which gives us back [a, b] , [c] which leads us to writing [a, b] a second time. The main benefit of this patch is to realize when its possible to reuse a node that already exists on disk. It achieves this by looking at the list of key/values when writing new nodes and comparing it to the old list of key/values for the node read from disk. By checking to see if the old list exists unchanged in the new list we can just reuse the old node. Node reuse is limited to when the old node is larger than 50% of the chunk threshold to maintain the B+Tree properties. The disk usage improvements this gives can also be quite dramatic. In the case above when we have ordered keys with large values (> 50% of the btree chunk size) we find upwards of 50% less disk usage. Random keys also benefit as well though to a lesser extent depending on disk size (as they will often be in the middle of an existing node which prevents our optimization). COUCHDB-3298
        Hide
        jira-bot ASF subversion and git services added a comment -

        Commit 07fa508228d6b32b190de06e1b00be08fe8db672 in couchdb's branch refs/heads/jenkins-pipeline from Paul Joseph Davis
        [ https://gitbox.apache.org/repos/asf?p=couchdb.git;h=07fa508 ]

        Opimize writing btree nodes

        As it turns out, the original change in COUCHDB-3298 ends up hurting
        disk usage when a view emits large amounts of data (i.e., more than
        half of the btree chunk size). The cause for this is that instead of
        writing single element nodes it would instead prefer to write kv nodes
        with three elements. While normally we might prefer this in memory, it
        turns out that our append only storage this causes a significantly more
        amount of trash on disk.

        We can show this with a few trivial examples. Imagine we write KV's a
        through f. The two following patterns show the nodes as we write each
        new kv.

        Before 3298:

        []
        [a]
        [a, b]
        [a, b]', [c]
        [a, b]', [c, d]
        [a, b]', [c, d]', [e]
        [a, b]', [c, d]', [e, f]

        After 3298:

        []
        [a]
        [a, b]
        [a, b, c]
        [a, b]', [c, d]
        [a, b]', [c, d, e]
        [a, b]', [c, d]', [e, f]

        The thing to realize here is which of these nodes end up as garbage. In
        the first example we end up with [a], [a, b], [c], [c, d], and [e] nodes
        that have been orphaned. Where as in the second case we end up with
        [a], [a, b], [a, b, c], [c, d], [c, d, e] as nodes that have been
        orphaned. A quick aside, the reason that [a, b] and [c, d] are orphaned
        is due to how a btree update works. For instance, when adding c, we read
        [a, b] into memory, append c, and then during our node write we call
        chunkify which gives us back [a, b], [c] which leads us to writing [a,
        b] a second time.

        The main benefit of this patch is to realize when its possible to reuse
        a node that already exists on disk. It achieves this by looking at the
        list of key/values when writing new nodes and comparing it to the old
        list of key/values for the node read from disk. By checking to see if
        the old list exists unchanged in the new list we can just reuse the old
        node. Node reuse is limited to when the old node is larger than 50% of
        the chunk threshold to maintain the B+Tree properties.

        The disk usage improvements this gives can also be quite dramatic. In
        the case above when we have ordered keys with large values (> 50% of the
        btree chunk size) we find upwards of 50% less disk usage. Random keys
        also benefit as well though to a lesser extent depending on disk size
        (as they will often be in the middle of an existing node which prevents
        our optimization).

        COUCHDB-3298

        Show
        jira-bot ASF subversion and git services added a comment - Commit 07fa508228d6b32b190de06e1b00be08fe8db672 in couchdb's branch refs/heads/jenkins-pipeline from Paul Joseph Davis [ https://gitbox.apache.org/repos/asf?p=couchdb.git;h=07fa508 ] Opimize writing btree nodes As it turns out, the original change in COUCHDB-3298 ends up hurting disk usage when a view emits large amounts of data (i.e., more than half of the btree chunk size). The cause for this is that instead of writing single element nodes it would instead prefer to write kv nodes with three elements. While normally we might prefer this in memory, it turns out that our append only storage this causes a significantly more amount of trash on disk. We can show this with a few trivial examples. Imagine we write KV's a through f. The two following patterns show the nodes as we write each new kv. Before 3298: [] [a] [a, b] [a, b] ', [c] [a, b] ', [c, d] [a, b] ', [c, d] ', [e] [a, b] ', [c, d] ', [e, f] After 3298: [] [a] [a, b] [a, b, c] [a, b] ', [c, d] [a, b] ', [c, d, e] [a, b] ', [c, d] ', [e, f] The thing to realize here is which of these nodes end up as garbage. In the first example we end up with [a] , [a, b] , [c] , [c, d] , and [e] nodes that have been orphaned. Where as in the second case we end up with [a] , [a, b] , [a, b, c] , [c, d] , [c, d, e] as nodes that have been orphaned. A quick aside, the reason that [a, b] and [c, d] are orphaned is due to how a btree update works. For instance, when adding c, we read [a, b] into memory, append c, and then during our node write we call chunkify which gives us back [a, b] , [c] which leads us to writing [a, b] a second time. The main benefit of this patch is to realize when its possible to reuse a node that already exists on disk. It achieves this by looking at the list of key/values when writing new nodes and comparing it to the old list of key/values for the node read from disk. By checking to see if the old list exists unchanged in the new list we can just reuse the old node. Node reuse is limited to when the old node is larger than 50% of the chunk threshold to maintain the B+Tree properties. The disk usage improvements this gives can also be quite dramatic. In the case above when we have ordered keys with large values (> 50% of the btree chunk size) we find upwards of 50% less disk usage. Random keys also benefit as well though to a lesser extent depending on disk size (as they will often be in the middle of an existing node which prevents our optimization). COUCHDB-3298

          People

          • Assignee:
            Unassigned
            Reporter:
            paul.joseph.davis Paul Joseph Davis
          • Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development