Uploaded image for project: 'IMPALA'
  1. IMPALA
  2. IMPALA-2716

Hive/Impala incompatibility for timestamp data in Parquet

Attach filesAttach ScreenshotVotersWatch issueWatchersCreate sub-taskLinkCloneUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Critical
    • Resolution: Fixed
    • Impala 2.0, Impala 2.1, Impala 2.2, Impala 2.3.0
    • Impala 2.9.0
    • Backend

    Description

      Problem
      Hive adjusts timestamps by subtracting the local time zone’s offset from all values when writing data to Parquet files. Hive is internally inconsistent because it behaves differently for other file formats. As a result of this adjustment, Impala may read "incorrect" timestamp values from Parquet files written by Hive, and vice versa.

      Workaround
      Enable the following compatibility flag in Impala which is false by default.

      --convert_legacy_hive_parquet_utc_timestamps
      When true, TIMESTAMPs read from files written by Parquet-MR (used by Hive) will be converted from UTC to local time. Writes are unaffected.

      For more details, please see IMPALA-1658

      Attachments

        Activity

          This comment will be Viewable by All Users Viewable by All Users
          Cancel

          People

            attilaj Attila Jeges
            alex.behm Alexander Behm
            Votes:
            0 Vote for this issue
            Watchers:
            9 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Slack

                Issue deployment