Details

    • Type: Bug
    • Status: Resolved
    • Priority: Critical
    • Resolution: Fixed
    • Affects Version/s: Impala 2.9.0
    • Fix Version/s: Impala 2.9.0
    • Component/s: Catalog
    • Labels:
    • Epic Color:
      ghx-label-6

      Description

      While support for TIMESTAMP columns in Kudu tables has been committed (IMPALA-5137), it does not support TIMESTAMP column default values. It turns out to be a bit tricky in the catalog.

      In addition to lacking the ability to specify the default values in DDL (both CREATE and ALTER columns), this also means tables with timestamp default values created outside of Impala (e.g. via the Kudu python client) cannot be loaded by Impala:

      import kudu
      from pytz import utc
      from datetime import datetime
      from kudu.client import Partitioning
      
      client = kudu.connect('localhost')
      schema_builder = SchemaBuilder()
      
      column_spec = schema_builder.add_column("id", INT64)
      column_spec.nullable(False)
      
      column_spec = schema_builder.add_column("ts", UNIXTIME_MICROS)
      column_spec.default(datetime(1987, 5, 19, 0, 0, tzinfo=utc))
      
      schema_builder.set_primary_keys(["id"])
      schema = schema_builder.build()
      
      client.create_table("tsdefault", schema,
          partitioning=Partitioning().set_range_partition_columns(["id"]))
      

      then in Impala:

      [localhost:21000] > create external table tsdefault stored as kudu TBLPROPERTIES (
        'kudu.table_name' = 'tsdefault' );
      Query: create external table tsdefault stored as kudu TBLPROPERTIES (
        'kudu.table_name' = 'tsdefault' )
      
      Fetched 0 row(s) in 0.22s
      [localhost:21000] > show create table tsdefault;
      Query: show create table tsdefault
      ERROR: AnalysisException: Failed to load metadata for table: default.tsdefault. Running 'invalidate metadata default.tsdefault' may resolve this problem.
      CAUSED BY: NullPointerException: null
      CAUSED BY: TableLoadingException: Failed to load metadata for table: default.tsdefault. Running 'invalidate metadata default.tsdefault' may resolve this problem.
      CAUSED BY: NullPointerException: null
      

      This is tricky in the catalog because the KuduColumn class loads the column metadata from Kudu, and it contains the default value as a LiteralExpr, but Kudu represents the timestamp as a bigint unix time micros. Impala should convert that value to a TimestampValue, which isn't hard to do in the backend but isn't easy in the catalog. Unless the catalog were to call into BE code, the KuduColumn class would need to store the default value as a bigint and then all code that then uses the default value later would need to know that it isn't the same type as the column.

        Activity

        Hide
        mjacobs Matthew Jacobs added a comment -

        commit 2dcbefc652ac59d62e83f55a40d4833b364d50be
        Author: Matthew Jacobs <mj@cloudera.com>
        Date: Mon May 22 18:15:08 2017 -0700

        IMPALA-5338: Fix Kudu timestamp column default values

        While support for TIMESTAMP columns in Kudu tables has been
        committed (IMPALA-5137), it does not support TIMESTAMP
        column default values.

        This supports CREATE TABLE syntax to specify the default
        values, but more importantly this fixes the loading of Kudu
        tables that may have had default values set on
        UNIXTIME_MICROS columns, e.g. if the table was created via
        the python client. This involves fixing KuduColumn to hide
        the LiteralExpr representing the default value because it
        will be a BIGINT if the column type is TIMESTAMP. It is only
        needed to call toSql() and toStringValue(), so helper
        functions are added to KuduColumn to encapsulate special
        logic for TIMESTAMP.

        TODO: Add support and tests for ALTER setting the default
        value (when IMPALA-4622 is committed).

        Change-Id: I655910fb4805bb204a999627fa9f68e43ea8aaf2
        Reviewed-on: http://gerrit.cloudera.org:8080/6936
        Reviewed-by: Matthew Jacobs <mj@cloudera.com>
        Tested-by: Impala Public Jenkins

        Show
        mjacobs Matthew Jacobs added a comment - commit 2dcbefc652ac59d62e83f55a40d4833b364d50be Author: Matthew Jacobs <mj@cloudera.com> Date: Mon May 22 18:15:08 2017 -0700 IMPALA-5338 : Fix Kudu timestamp column default values While support for TIMESTAMP columns in Kudu tables has been committed ( IMPALA-5137 ), it does not support TIMESTAMP column default values. This supports CREATE TABLE syntax to specify the default values, but more importantly this fixes the loading of Kudu tables that may have had default values set on UNIXTIME_MICROS columns, e.g. if the table was created via the python client. This involves fixing KuduColumn to hide the LiteralExpr representing the default value because it will be a BIGINT if the column type is TIMESTAMP. It is only needed to call toSql() and toStringValue(), so helper functions are added to KuduColumn to encapsulate special logic for TIMESTAMP. TODO: Add support and tests for ALTER setting the default value (when IMPALA-4622 is committed). Change-Id: I655910fb4805bb204a999627fa9f68e43ea8aaf2 Reviewed-on: http://gerrit.cloudera.org:8080/6936 Reviewed-by: Matthew Jacobs <mj@cloudera.com> Tested-by: Impala Public Jenkins

          People

          • Assignee:
            mjacobs Matthew Jacobs
            Reporter:
            mjacobs Matthew Jacobs
          • Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development