Uploaded image for project: 'IMPALA'
  1. IMPALA
  2. IMPALA-2716

Hive/Impala incompatibility for timestamp data in Parquet

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Critical
    • Resolution: Fixed
    • Affects Version/s: Impala 2.0, Impala 2.1, Impala 2.2, Impala 2.3.0
    • Fix Version/s: Impala 2.9.0
    • Component/s: Backend

      Description

      Problem
      Hive adjusts timestamps by subtracting the local time zone’s offset from all values when writing data to Parquet files. Hive is internally inconsistent because it behaves differently for other file formats. As a result of this adjustment, Impala may read "incorrect" timestamp values from Parquet files written by Hive, and vice versa.

      Workaround
      Enable the following compatibility flag in Impala which is false by default.

      --convert_legacy_hive_parquet_utc_timestamps
      When true, TIMESTAMPs read from files written by Parquet-MR (used by Hive) will be converted from UTC to local time. Writes are unaffected.

      For more details, please see IMPALA-1658

        Attachments

          Activity

            People

            • Assignee:
              attilaj Attila Jeges
              Reporter:
              alex.behm Alexander Behm
            • Votes:
              0 Vote for this issue
              Watchers:
              8 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: