Uploaded image for project: 'Kudu'
  1. Kudu
  2. KUDU-2253

Deltafile on-disk size is 3x larger than expected for large-value workloads

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: 1.6.0
    • Fix Version/s: 1.7.0
    • Component/s: perf, tablet
    • Labels:
      None

      Description

      While looking into the performance of the integration test written for KUDU-2251 (https://gerrit.cloudera.org/#/c/8951/ revision 6), Todd and I found that the on-disk deltafiles written are about 3x larger than expected. The culprit is an optimization in the CFile value index which is turned off for delta files. The optimization truncates large keys after the first unique byte between sequential values. The deltafile values, in the case of this integration test, include the small DeltaKey, and the 8KiB updated value. As a result the BTree interior nodes are being completely filled by only ~4 values (32KiB cblock size by default). This makes the BTree far less effective, and means that the full updated data is written many times. We expect fixing this will improve performance for update-heavy workloads with large values (for example YCSB).

        Attachments

          Activity

            People

            • Assignee:
              danburkert Dan Burkert
              Reporter:
              danburkert Dan Burkert
            • Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: