I broke this work into a few parts, the first part to remove the IGNORE and just handle the duplicate key (INSERT) and key not found (UPDATE, DELETE) errors by default went in:
Author: Matthew Jacobs <firstname.lastname@example.org>
Date: Tue Nov 1 17:52:21 2016 -0700
IMPALA-3710: Kudu DML should ignore conflicts by default
Removes the non-standard IGNORE syntax that was allowed for
DML into Kudu tables to indicate that certain errors should
be ignored, i.e. not fail the query and continue. However,
because there is no way to 'roll back' mutations that
occurred before an error occurs, tables are left in an
inconsistent state and it's difficult to know what rows were
successfully modified vs which rows were not. Instead, this
change makes it so that we always 'ignore' these conflicts,
i.e. a 'best effort'. In the future, when Kudu will provide
the mechanisms Impala needs to provide a notion of isolation
levels, then Impala will be able to provide options for more
After this change, the following errors are ignored:
- INSERT where the PK already exists
- UPDATE/DELETE where the PK doesn't exist
Another follow-up patch will change other violations to be
handled in this way as well, e.g. nulls inserted in
The number of rows inserted is reported to the coordinator,
which makes the aggregate available to the shell and via the
TODO: Return rows modified for INSERT via HS2 (IMPALA-1789).
TODO: Return rows modified for other CRUD (beeswax+hs2) (
TODO: Return error counts for specific warnings (IMPALA-4416).
Updated tests. Ran all functional tests. More tests will be
needed when other conflicts are handled in the same way.
Reviewed-by: Alex Behm <email@example.com>
Tested-by: Internal Jenkins
A second patch will add support for handling the following errors as ignored in the same way:
- NULLs in non-nullable columns, i.e. null constraint violoations.
- Rows with PKs that are in an 'uncovered range'.