When trying to run COMPUTE STATS on a table I created without any column definition (all columns come from the Avro schema and the partition keys), it fails with the following error message:
The documentation states:
Originally, Impala relied on users to run the Hive ANALYZE TABLE statement, but that method of gathering statistics proved unreliable and difficult to use. The Impala COMPUTE STATS statement is built from the ground up to improve the reliability and user-friendliness of this operation.
To me, having to re-create the table with column definitions in the Hive metastore is not so user-friendly. Since COMPUTE STATS was built from the ground up, can it not get the columns list from the schema and partitions, rather than use the Hive metastore for that?
Otherwise, I have to keep on re-creating the table... In case I use that workaround, how do I efficiently "transfer" all partitions to the new table?
- As per Impala 1.4, CREATE TABLE will find the columns from the Avro schema
- What is still required is only the update of these columns as the schema evolves (at least when ALTER TABLE is used to change the schema URL, possibly also if the file on HDFS changes?)