Details
-
Task
-
Status: Resolved
-
Major
-
Resolution: Won't Fix
-
Impala 2.8.0
Description
When loading Impala test data, we "generally" partition tables in our data load process by adding PARTITION_COLUMNS and ALTER sections to the schema template files, e.g. from functional_schema_template.sql:
---- DATASET functional ---- BASE_TABLE_NAME alltypessmall ---- COLUMNS id int bool_col boolean tinyint_col tinyint smallint_col smallint int_col int bigint_col bigint float_col float double_col double date_string_col string string_col string timestamp_col timestamp ---- PARTITION_COLUMNS year int month int ---- ALTER ALTER TABLE {table_name} ADD IF NOT EXISTS PARTITION(year=2009, month=1); ALTER TABLE {table_name} ADD IF NOT EXISTS PARTITION(year=2009, month=2); ALTER TABLE {table_name} ADD IF NOT EXISTS PARTITION(year=2009, month=3); ALTER TABLE {table_name} ADD IF NOT EXISTS PARTITION(year=2009, month=4);
However, some tables forego this, and combine the PARTITION BY clause with the CREATE TABLE clause, and they may or may not include an ALTER section. This sidesteps logic in generate-schema-statements.py that specifically branches based upon whether the PARTITION_COLUMNS and/or ALTER sections have been defined.
We should investigate what effect the omission of these sections has on our data load process for those tables.
Attachments
Issue Links
- relates to
-
IMPALA-4005 generate_statements method in generate_schema_statements.py needs refactoring
- Open