Uploaded image for project: 'IMPALA'
  1. IMPALA
  2. IMPALA-867

Fail early (in analysis) when COMPUTE STATS is run against Avro table with no columns

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • Impala 1.2.4, Impala 1.3
    • Impala 1.3
    • None
    • None

    Description

      If an Avro table is created without any columns defined (eg. the schema is only defined in the table's avro.schema TABLE/SERDEPROPERTIES), then COMPUTE STATS will fail with a somewhat cryptic error:

      InternalException: Column 'col_name' for which stats gathering is requested doesn't exist.
      

      This is because of a bug in the Hive Metastore which should reject creating tables without any column definitions.

      However, this error is returned at the very end of the COMPUTE STATS operation, which may take a long time for large tables. We should see if it is possible to detect the invalid schema early, and fail COMPUTE STATS in analysis.

      We should also verify whether COMPUTE STATS works in the case where the table contains some columns, but they don't match the avro schema. For example:

      create table (int_col int)
      ...
      WITH SERDEPROPERTIES ('avro.schema.literal'='{
      "name": "a",
      "type": "record",
      "fields": [
        {"name":"string_col",  "type": ["null", "string"],  "default": null}
      ]}')
      

      Attachments

        Activity

          People

            alex.behm Alexander Behm
            lskuff Lenni Kuff
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: