Details
Description
Some systems (e.g Spark) write Parquet files with integral types using logical types. Impala fails to handle these logical types when constructing a table from an existing Parquet file. However, reading data from such files works fine.
For example, consider a file the following Parquet schema:
[ec2-user@ip-172-31-61-61 ~]$ parquet-tools schema part-r-00000-a409eea5-3d4f-4172-b376-659005f65489.gz.parquet message spark_schema { optional int32 id; optional int32 tinyint_col (INT_8); optional int32 smallint_col (INT_16); optional int32 int_col; optional int64 bigint_col; }
A CREATE TABLE ... LIKE PARQUET statement fails with something like the following:
ERROR: AnalysisException: Unsupported logical parquet type INT_8 (primitive type is INT32) for field tinyint_col
This functionality is handled by the convertLogicalParquetType method in the com.cloudera.impala.analysis.CreateTableLikeFileStmt class, which currently does not handle integer logical types.
See https://github.com/apache/parquet-format/blob/master/LogicalTypes.md#numeric-types for information about the mapping between logical types and encodings.
We should implement read and write support for this metadata, i.e. allow correct round-tripping of tinyint and smallint types.