Uploaded image for project: 'IMPALA'
  1. IMPALA
  2. IMPALA-5052

Read and write signed integer logical type metadata in Parquet

    Details

      Description

      Some systems (e.g Spark) write Parquet files with integral types using logical types. Impala fails to handle these logical types when constructing a table from an existing Parquet file. However, reading data from such files works fine.

      For example, consider a file the following Parquet schema:

      [ec2-user@ip-172-31-61-61 ~]$ parquet-tools schema part-r-00000-a409eea5-3d4f-4172-b376-659005f65489.gz.parquet
      message spark_schema {
        optional int32 id;
        optional int32 tinyint_col (INT_8);
        optional int32 smallint_col (INT_16);
        optional int32 int_col;
        optional int64 bigint_col;
      }
      

      A CREATE TABLE ... LIKE PARQUET statement fails with something like the following:

      ERROR: AnalysisException: Unsupported logical parquet type INT_8 (primitive type is INT32) for field tinyint_col
      

      This functionality is handled by the convertLogicalParquetType method in the com.cloudera.impala.analysis.CreateTableLikeFileStmt class, which currently does not handle integer logical types.

      See https://github.com/apache/parquet-format/blob/master/LogicalTypes.md#numeric-types for information about the mapping between logical types and encodings.

      We should implement read and write support for this metadata, i.e. allow correct round-tripping of tinyint and smallint types.

        Attachments

          Activity

            People

            • Assignee:
              anujphadke Anuj Phadke
              Reporter:
              ibuss_impala_c846 Ian Buss
            • Votes:
              2 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: