Details
-
Bug
-
Status: Resolved
-
Major
-
Resolution: Fixed
-
Impala 1.2.4, Impala 1.3
-
None
-
None
Description
If an Avro table is created without any columns defined (eg. the schema is only defined in the table's avro.schema TABLE/SERDEPROPERTIES), then COMPUTE STATS will fail with a somewhat cryptic error:
InternalException: Column 'col_name' for which stats gathering is requested doesn't exist.
This is because of a bug in the Hive Metastore which should reject creating tables without any column definitions.
However, this error is returned at the very end of the COMPUTE STATS operation, which may take a long time for large tables. We should see if it is possible to detect the invalid schema early, and fail COMPUTE STATS in analysis.
We should also verify whether COMPUTE STATS works in the case where the table contains some columns, but they don't match the avro schema. For example:
create table (int_col int) ... WITH SERDEPROPERTIES ('avro.schema.literal'='{ "name": "a", "type": "record", "fields": [ {"name":"string_col", "type": ["null", "string"], "default": null} ]}')