Uploaded image for project: 'IMPALA'
  1. IMPALA
  2. IMPALA-2716

Hive/Impala incompatibility for timestamp data in Parquet

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Critical
    • Resolution: Fixed
    • Impala 2.0, Impala 2.1, Impala 2.2, Impala 2.3.0
    • Impala 2.9.0
    • Backend

    Description

      Problem
      Hive adjusts timestamps by subtracting the local time zone’s offset from all values when writing data to Parquet files. Hive is internally inconsistent because it behaves differently for other file formats. As a result of this adjustment, Impala may read "incorrect" timestamp values from Parquet files written by Hive, and vice versa.

      Workaround
      Enable the following compatibility flag in Impala which is false by default.

      --convert_legacy_hive_parquet_utc_timestamps
      When true, TIMESTAMPs read from files written by Parquet-MR (used by Hive) will be converted from UTC to local time. Writes are unaffected.

      For more details, please see IMPALA-1658

      Attachments

        Activity

          People

            attilaj Attila Jeges
            alex.behm Alexander Behm
            Votes:
            0 Vote for this issue
            Watchers:
            9 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: