Uploaded image for project: 'IMPALA'
  2. IMPALA-1104

Allow creating Avro tables without column definitions. Allow COMPUTE STATS to always work on Impala-created Avro tables.



    • Type: Improvement
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: Impala 1.3.1
    • Fix Version/s: Impala 2.0
    • Component/s: None
    • Labels:


      When trying to run COMPUTE STATS on a table I created without any column definition (all columns come from the Avro schema and the partition keys), it fails with the following error message:

      Query: compute stats mytable
      ERROR: AnalysisException: Cannot COMPUTE STATS on Avro table 'mytable' because its column definitions do not match those in the Avro schema.
      Missing column definition corresponding to Avro-schema column 'thefirstcolumn' of type 'STRING' at position '0'.
      Please re-create the table with column definitions, e.g., using the result of 'SHOW CREATE TABLE'

      I feel this is somewhat related to IMPALA-867, and I also understand the workaround proposed in the error message (the same thing is proposed in the comments of IMPALA-867).

      The documentation states:

      Originally, Impala relied on users to run the Hive ANALYZE TABLE statement, but that method of gathering statistics proved unreliable and difficult to use. The Impala COMPUTE STATS statement is built from the ground up to improve the reliability and user-friendliness of this operation.

      To me, having to re-create the table with column definitions in the Hive metastore is not so user-friendly. Since COMPUTE STATS was built from the ground up, can it not get the columns list from the schema and partitions, rather than use the Hive metastore for that?

      Otherwise, I have to keep on re-creating the table... In case I use that workaround, how do I efficiently "transfer" all partitions to the new table?


      • As per Impala 1.4, CREATE TABLE will find the columns from the Avro schema
      • What is still required is only the update of these columns as the schema evolves (at least when ALTER TABLE is used to change the schema URL, possibly also if the file on HDFS changes?)




            • Assignee:
              alex.behm Alexander Behm
              julienlehuen Julien Lehuen
            • Votes:
              0 Vote for this issue
              4 Start watching this issue


              • Created: