Uploaded image for project: 'Hive'
  1. Hive
  2. HIVE-21002 TIMESTAMP - Backwards incompatible change: Hive 3.1 reads back Avro and Parquet timestamps written by Hive 2.x incorrectly
  3. HIVE-21291

Restore historical way of handling timestamps in Avro while keeping the new semantics at the same time

    XMLWordPrintableJSON

    Details

    • Type: Sub-task
    • Status: Closed
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 3.1.2, 3.2.0, 4.0.0
    • Component/s: None
    • Labels:
      None

      Description

      This sub-task is for implementing the Avro-specific parts of the following plan:

      Problem

      Historically, the semantics of the TIMESTAMP type in Hive depended on the file format. Timestamps in Avro, Parquet and RCFiles with a binary SerDe had Instant semantics, while timestamps in ORC, textfiles and RCFiles with a text SerDe had LocalDateTime semantics.

      The Hive community wanted to get rid of this inconsistency and have LocalDateTime semantics in Avro, Parquet and RCFiles with a binary SerDe as well. Hive 3.1 turned off normalization to UTC to achieve this. While this leads to the desired new semantics, it also leads to incorrect results when new Hive versions read timestamps written by old Hive versions or when old Hive versions or any other component not aware of this change (including legacy Impala and Spark versions) read timestamps written by new Hive versions.

      Solution

      To work around this issue, Hive should restore the practice of normalizing to UTC when writing timestamps to Avro, Parquet and RCFiles with a binary SerDe. In itself, this would restore the historical Instant semantics, which is undesirable. In order to achieve the desired LocalDateTime semantics in spite of normalizing to UTC, newer Hive versions should record the session-local local time zone in the file metadata fields serving arbitrary key-value storage purposes.

      When reading back files with this time zone metadata, newer Hive versions (or any other new component aware of this extra metadata) can achieve LocalDateTime semantics by converting from UTC to the saved time zone (instead of to the local time zone). Legacy components that are unaware of the new metadata can read the files without any problem and the timestamps will show the historical Instant behaviour to them.

        Attachments

        1. HIVE-21291.1.patch
          21 kB
          Karen Coppage
        2. HIVE-21291.2.patch
          23 kB
          Karen Coppage
        3. HIVE-21291.3.patch
          23 kB
          Karen Coppage
        4. HIVE-21291.4.patch
          23 kB
          Karen Coppage
        5. HIVE-21291.4.patch
          23 kB
          Karen Coppage
        6. HIVE-21291.5.patch
          27 kB
          Karen Coppage
        7. HIVE-21291.6.patch
          27 kB
          Karen Coppage
        8. HIVE-21291.7.patch
          27 kB
          Karen Coppage
        9. HIVE-21291.7.patch
          27 kB
          Karen Coppage
        10. HIVE-21291.branch-3.patch
          32 kB
          Karen Coppage
        11. HIVE-21291.branch-3.1.patch
          32 kB
          Karen Coppage

          Issue Links

            Activity

              People

              • Assignee:
                klcopp Karen Coppage
                Reporter:
                zi Zoltan Ivanfi
              • Votes:
                0 Vote for this issue
                Watchers:
                4 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: