Author: Thomas Tauber-Marshall <firstname.lastname@example.org>
Date: Wed Apr 5 12:35:53 2017 -0700
IMPALA-3742: Partitions and sort INSERTs for Kudu tables
Bulk DMLs (INSERT, UPSERT, UPDATE, and DELETE) for Kudu
are currently painful because we just send rows randomly,
which creates a lot of work for Kudu since it partitions
and sorts data before writing, causing writes to be slow
and leading to timeouts.
We can alleviate this by sending the rows to Kudu already
partitioned and sorted. This patch partitions and sorts
rows according to Kudu's partitioning scheme for INSERTs
and UPSERTs. A followup patch will handle UPDATE and DELETE.
It accomplishes this by inserting an exchange node and a sort
node into the plan before the operation. Both the exchange and
the sort are given a KuduPartitionExpr which takes a row and
calls into the Kudu client to return its partition number.
It also disallows INSERT hints for Kudu tables, since the
hints that we support (SHUFFLE, CLUSTER, SORTBY), so longer
- Updated planner tests.
- Ran the Kudu functional tests.
- Ran performance tests demonstrating that we can now handle much
larger inserts without having timeouts.
Reviewed-by: Thomas Tauber-Marshall <email@example.com>
Tested-by: Impala Public Jenkins