Details
-
Bug
-
Status: Resolved
-
Critical
-
Resolution: Fixed
-
Impala 2.9.0
-
ghx-label-6
Description
While support for TIMESTAMP columns in Kudu tables has been committed (IMPALA-5137), it does not support TIMESTAMP column default values. It turns out to be a bit tricky in the catalog.
In addition to lacking the ability to specify the default values in DDL (both CREATE and ALTER columns), this also means tables with timestamp default values created outside of Impala (e.g. via the Kudu python client) cannot be loaded by Impala:
import kudu from pytz import utc from datetime import datetime from kudu.client import Partitioning client = kudu.connect('localhost') schema_builder = SchemaBuilder() column_spec = schema_builder.add_column("id", INT64) column_spec.nullable(False) column_spec = schema_builder.add_column("ts", UNIXTIME_MICROS) column_spec.default(datetime(1987, 5, 19, 0, 0, tzinfo=utc)) schema_builder.set_primary_keys(["id"]) schema = schema_builder.build() client.create_table("tsdefault", schema, partitioning=Partitioning().set_range_partition_columns(["id"]))
then in Impala:
[localhost:21000] > create external table tsdefault stored as kudu TBLPROPERTIES ( 'kudu.table_name' = 'tsdefault' ); Query: create external table tsdefault stored as kudu TBLPROPERTIES ( 'kudu.table_name' = 'tsdefault' ) Fetched 0 row(s) in 0.22s [localhost:21000] > show create table tsdefault; Query: show create table tsdefault ERROR: AnalysisException: Failed to load metadata for table: default.tsdefault. Running 'invalidate metadata default.tsdefault' may resolve this problem. CAUSED BY: NullPointerException: null CAUSED BY: TableLoadingException: Failed to load metadata for table: default.tsdefault. Running 'invalidate metadata default.tsdefault' may resolve this problem. CAUSED BY: NullPointerException: null
This is tricky in the catalog because the KuduColumn class loads the column metadata from Kudu, and it contains the default value as a LiteralExpr, but Kudu represents the timestamp as a bigint unix time micros. Impala should convert that value to a TimestampValue, which isn't hard to do in the backend but isn't easy in the catalog. Unless the catalog were to call into BE code, the KuduColumn class would need to store the default value as a bigint and then all code that then uses the default value later would need to know that it isn't the same type as the column.