[KUDU-1439] Optimization for batch inserts into empty key ranges - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Improvement
Status: Open
Priority: Major
Resolution: Unresolved
Affects Version/s: None
Fix Version/s: None
Component/s: perf, tablet
Labels:
- performance

Description

Got this idea from a CockroachDB optimization:
https://github.com/cockroachdb/cockroach/pull/6375

The short version is that if we have a moderately large batch of inserts which are sorted, we can do the following pseudocode:

sort the inserts by primary key
instead of using bloom filter checks, use SeekAtOrAfter on the first primary key in the batch. This yields the next higher primary key that might exist in the table (nextKey).
for each of the keys in the sorted batch, if it's less than nextKey, we don't need to do an existence check for it.

In the common case where clients are writing non-overlapping batches of rows (eg importing from parquet) this should reduce the number of seeks and bloom checks dramatically (order of batch size). Plus, it doesn't require much new code to be written, so worth a shot.

Attachments

Issue Links

is related to

KUDU-1370 Implement bulk insert API

Open

relates to

KUDU-1220 Improve bulk loads from multiple sequential writers

Open

Activity

People

Assignee:: Todd Lipcon

Reporter:: Todd Lipcon

Votes:: 0 Vote for this issue

Watchers:: 6 Start watching this issue

Dates

Created:: 05/May/16 22:11

Updated:: 13/Jan/21 16:11