Uploaded image for project: 'Kudu'
  1. Kudu
  2. KUDU-1439

Optimization for batch inserts into empty key ranges

    XMLWordPrintableJSON

    Details

    • Type: Improvement
    • Status: Open
    • Priority: Major
    • Resolution: Unresolved
    • Affects Version/s: None
    • Fix Version/s: None
    • Component/s: perf, tablet
    • Labels:
      None

      Description

      Got this idea from a CockroachDB optimization:
      https://github.com/cockroachdb/cockroach/pull/6375

      The short version is that if we have a moderately large batch of inserts which are sorted, we can do the following pseudocode:

      • sort the inserts by primary key
      • instead of using bloom filter checks, use SeekAtOrAfter on the first primary key in the batch. This yields the next higher primary key that might exist in the table (nextKey).
      • for each of the keys in the sorted batch, if it's less than nextKey, we don't need to do an existence check for it.

      In the common case where clients are writing non-overlapping batches of rows (eg importing from parquet) this should reduce the number of seeks and bloom checks dramatically (order of batch size). Plus, it doesn't require much new code to be written, so worth a shot.

        Attachments

          Issue Links

            Activity

              People

              • Assignee:
                tlipcon Todd Lipcon
                Reporter:
                tlipcon Todd Lipcon
              • Votes:
                0 Vote for this issue
                Watchers:
                5 Start watching this issue

                Dates

                • Created:
                  Updated: