Uploaded image for project: 'Kudu'
  1. Kudu
  2. KUDU-2253

Deltafile on-disk size is 3x larger than expected for large-value workloads

Attach filesAttach ScreenshotVotersWatch issueWatchersCreate sub-taskLinkCloneUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • 1.6.0
    • 1.7.0
    • perf, tablet
    • None

    Description

      While looking into the performance of the integration test written for KUDU-2251 (https://gerrit.cloudera.org/#/c/8951/ revision 6), Todd and I found that the on-disk deltafiles written are about 3x larger than expected. The culprit is an optimization in the CFile value index which is turned off for delta files. The optimization truncates large keys after the first unique byte between sequential values. The deltafile values, in the case of this integration test, include the small DeltaKey, and the 8KiB updated value. As a result the BTree interior nodes are being completely filled by only ~4 values (32KiB cblock size by default). This makes the BTree far less effective, and means that the full updated data is written many times. We expect fixing this will improve performance for update-heavy workloads with large values (for example YCSB).

      Attachments

        Activity

          This comment will be Viewable by All Users Viewable by All Users
          Cancel

          People

            danburkert Dan Burkert
            danburkert Dan Burkert
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Slack

                Issue deployment