While looking into the performance of the integration test written for
KUDU-2251 (https://gerrit.cloudera.org/#/c/8951/ revision 6), Todd and I found that the on-disk deltafiles written are about 3x larger than expected. The culprit is an optimization in the CFile value index which is turned off for delta files. The optimization truncates large keys after the first unique byte between sequential values. The deltafile values, in the case of this integration test, include the small DeltaKey, and the 8KiB updated value. As a result the BTree interior nodes are being completely filled by only ~4 values (32KiB cblock size by default). This makes the BTree far less effective, and means that the full updated data is written many times. We expect fixing this will improve performance for update-heavy workloads with large values (for example YCSB).