Uploaded image for project: 'IMPALA'
  1. IMPALA
  2. IMPALA-7723

Recognize int64 timestamps in CREATE TABLE LIKE PARQUET

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Closed
    • Minor
    • Resolution: Invalid
    • None
    • None
    • Frontend
    • ghx-label-1

    Description

      IMPALA-5050 adds support for reading int64 encoded Parquet timestamps. These columns have int64 physical type, and converted/logical types has to be used to differentiate them from BIGINTs. These columns can be read both as BIGINTs and TIMESTAMPs depending on the table's schema.

      CREATE TABLE LIKE PARQUET could also convert these columns to TIMESTAMP instead of BIGINT, but I decided to postpone adding this feature for two reasons:

      1. It could break the following possible workflow:

      • generate Parquet files (that contain int64 timestamps) with some tool
      • use Impala's CREATE TABLE LIKE PARQUET + LOAD DATA to make it accessible as a table
      • run some queries that rely on interpreting these columns as integers

      CAST (col as BIGINT) in the query would make this even worse, as it would convert timestamp to unix time in seconds instead of micros/millis without any warning.

      2. Adding support for int64 timestamps with nanoseconds precision will need Impala's parquet-hadoop-bundle dependency to be bumped to a new major version, which may contain incompatible API changes.

      Note that parquet-hadoop-bundle is only used in CREATE TABLE LIKE PARQUET. The C++ parts of Impala only rely on parquet.thrift, which can be updated more easily.

      Attachments

        Issue Links

          Activity

            People

              Unassigned Unassigned
              csringhofer Csaba Ringhofer
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: