Uploaded image for project: 'IMPALA'
  1. IMPALA
  2. IMPALA-2716

Hive/Impala incompatibility for timestamp data in Parquet

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Critical
    • Resolution: Fixed
    • Affects Version/s: Impala 2.0, Impala 2.1, Impala 2.2, Impala 2.3.0
    • Fix Version/s: Impala 2.9.0
    • Component/s: Backend

      Description

      Problem
      Hive adjusts timestamps by subtracting the local time zone’s offset from all values when writing data to Parquet files. Hive is internally inconsistent because it behaves differently for other file formats. As a result of this adjustment, Impala may read "incorrect" timestamp values from Parquet files written by Hive, and vice versa.

      Workaround
      Enable the following compatibility flag in Impala which is false by default.

      --convert_legacy_hive_parquet_utc_timestamps
      When true, TIMESTAMPs read from files written by Parquet-MR (used by Hive) will be converted from UTC to local time. Writes are unaffected.

      For more details, please see IMPALA-1658

        Activity

        Hide
        dhecht Dan Hecht added a comment -

        [~rblue], the workaround documented here looks outdated. Do you want to fix it up? Also, should this be assigned to Taras and is it still on track for 2.5?

        Show
        dhecht Dan Hecht added a comment - [~rblue] , the workaround documented here looks outdated. Do you want to fix it up? Also, should this be assigned to Taras and is it still on track for 2.5?
        Hide
        srus Silvius Rus added a comment -

        This is not an Impala correctness issue but rather an incompatibility with Hive. Removing the correctness label.

        Also, it will not get done in 2.5 because it needs to be coordinated with Spark releases and they are not ready in this timeframe. Pushing to 2.6.

        Show
        srus Silvius Rus added a comment - This is not an Impala correctness issue but rather an incompatibility with Hive. Removing the correctness label. Also, it will not get done in 2.5 because it needs to be coordinated with Spark releases and they are not ready in this timeframe. Pushing to 2.6.
        Hide
        tarasbob Taras Bobrovytsky added a comment -

        Silvius Rus, should this be assigned to the Budapest team?

        Show
        tarasbob Taras Bobrovytsky added a comment - Silvius Rus , should this be assigned to the Budapest team?
        Hide
        attilaj Attila Jeges added a comment -

        https://github.com/apache/incubator-impala/commit/5803a0b0744ddaee6830d4a1bc8dba8d3f2caa26

        commit 5803a0b0744ddaee6830d4a1bc8dba8d3f2caa26
        Author: Attila Jeges <attilaj@cloudera.com>
        Date: Wed Feb 8 19:44:16 2017 +0100

        IMPALA-2716: Hive/Impala incompatibility for timestamp data in Parquet

        Before this change:
        Hive adjusts timestamps by subtracting the local time zone's offset
        from all values when writing data to Parquet files. Hive is internally
        inconsistent because it behaves differently for other file formats. As
        a result of this adjustment, Impala may read "incorrect" timestamp
        values from Parquet files written by Hive.

        After this change:
        Impala reads Parquet MR timestamp data and adjusts values using a time
        zone from a table property (parquet.mr.int96.write.zone), if set, and
        will not adjust it if the property is absent. No adjustment will be
        applied to data written by Impala.

        New HDFS tables created by Impala using CREATE TABLE and CREATE TABLE
        LIKE <file> will set the table property to UTC if the global flag
        --set_parquet_mr_int96_write_zone_to_utc_on_new_tables is set to true.

        HDFS tables created by Impala using CREATE TABLE LIKE <other table>
        will copy the property of the table that is copied.

        This change also affects the way Impala deals with
        --convert_legacy_hive_parquet_utc_timestamps global flag (introduced
        in IMPALA-1658). The flag will be taken into account only if
        parquet.mr.int96.write.zone table property is not set and ignored
        otherwise.

        Change-Id: I3f24525ef45a2814f476bdee76655b30081079d6
        Reviewed-on: http://gerrit.cloudera.org:8080/5939
        Reviewed-by: Dan Hecht <dhecht@cloudera.com>
        Tested-by: Impala Public Jenkins

        Show
        attilaj Attila Jeges added a comment - https://github.com/apache/incubator-impala/commit/5803a0b0744ddaee6830d4a1bc8dba8d3f2caa26 commit 5803a0b0744ddaee6830d4a1bc8dba8d3f2caa26 Author: Attila Jeges <attilaj@cloudera.com> Date: Wed Feb 8 19:44:16 2017 +0100 IMPALA-2716 : Hive/Impala incompatibility for timestamp data in Parquet Before this change: Hive adjusts timestamps by subtracting the local time zone's offset from all values when writing data to Parquet files. Hive is internally inconsistent because it behaves differently for other file formats. As a result of this adjustment, Impala may read "incorrect" timestamp values from Parquet files written by Hive. After this change: Impala reads Parquet MR timestamp data and adjusts values using a time zone from a table property (parquet.mr.int96.write.zone), if set, and will not adjust it if the property is absent. No adjustment will be applied to data written by Impala. New HDFS tables created by Impala using CREATE TABLE and CREATE TABLE LIKE <file> will set the table property to UTC if the global flag --set_parquet_mr_int96_write_zone_to_utc_on_new_tables is set to true. HDFS tables created by Impala using CREATE TABLE LIKE <other table> will copy the property of the table that is copied. This change also affects the way Impala deals with --convert_legacy_hive_parquet_utc_timestamps global flag (introduced in IMPALA-1658 ). The flag will be taken into account only if parquet.mr.int96.write.zone table property is not set and ignored otherwise. Change-Id: I3f24525ef45a2814f476bdee76655b30081079d6 Reviewed-on: http://gerrit.cloudera.org:8080/5939 Reviewed-by: Dan Hecht <dhecht@cloudera.com> Tested-by: Impala Public Jenkins

          People

          • Assignee:
            attilaj Attila Jeges
            Reporter:
            alex.behm Alexander Behm
          • Votes:
            0 Vote for this issue
            Watchers:
            7 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development