Author: Michael Brown <firstname.lastname@example.org>
Date: Tue Nov 1 17:54:14 2016 -0700
IMPALA-4338: test infra data migrator: include tables' primary keys in PostgreSQL
This patch adds the ability for the test infrastructure's
Impala-to-PostgreSQL data migration tool to recognize whether the Impala
source tables have primary keys, and if so, CREATE the tables in
PostgreSQL with the same primary keys. This is needed especially for
performing CRUD operations by the random query generator for comparison
with Impala/Kudu tables and equivalent PostgreSQL tables.
I modified the make_create_table_sql() implementation to check the
"universal" Python object model of the table's columns. We generate
CREATE TABLE statements with, or without, a PRIMARY KEY clause. For
Impala-side tables that this tool may create, we also ensure that we
only write such a clause when the table's format supports primary keys
When the random query generator runs, it needs to know that the tables
it's examining in both databases are equivalent. It does this by
examining the tables' names, column names, and column types. I have
added whether the column is a primary key as part of this equivalence
- The patch includes some unit and system tests for the tool.
- Actually migrated a few small Kudu and HDFS tables from Impala into
both PostgreSQL 9.3 and 9.5 and examined the tables in PostgreSQL to
make sure they had primary keys (or not) as expected.
- Very short discrepancy_searcher.py --explain-only runs to test positive
and negative cases of Impala/PostgreSQL equivalency.
Reviewed-by: Taras Bobrovytsky <email@example.com>
Tested-by: Internal Jenkins