Author: Michael Brown <email@example.com>
Date: Wed Dec 7 14:20:05 2016 -0800
IMPALA-4351,IMPALA-4353: [qgen] randomly generate INSERT statements
- Generate INSERT statements that are either INSERT ... VALUES or INSERT
- On both types of INSERTs, we either insert into all columns, or into
some column list. If the column list exists, all primary keys will be
present, and 0 or more additional columns will also be in the list.
The ordering of the column list is random.
- For INSERT ... SELECT, occasionally generate a WITH clause
- For INSERT ... VALUES, generate non-null constants for the primary
keys, but for the non-primary keys, randomly generate a value
The type system in the random statement/query generator isn't
sophisticated enough to the implicit type of a SELECT item or a value
expression. It knows it will be some INT-based type, but not if it's
going to be a SMALLINT or a BIGINT. To get around this, the easiest
thing seems to be to explicitly cast the SELECT items or value
expressions to the columns' so-called exact_type attribute.
Much of the testing here involved running discrepancy_searcher.py
--explain-only on both tpch_kudu and a random HDFS table, using both the
default profile and DML-only profile. This was done to quickly find bugs
in the statement generation, as they tend to bubble up as analysis
errors. I expect to make other changes as follow on patches and more
random statements find small test issues.
For actual use against Kudu data, you need to migrate data from Kudu
into PostgreSQL 5 (instructions tests/comparison/POSTGRES.txt) and run
--postgresql-port 5433 \
--profile dmlonly \
--timeout 300 \
--db-name tpch_kudu \
Reviewed-by: Jim Apple <firstname.lastname@example.org>
Tested-by: Impala Public Jenkins