Uploaded image for project: 'Hive'
  1. Hive
  2. HIVE-9482

Hive parquet timestamp compatibility

    Details

    • Type: Bug
    • Status: Closed
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: 0.15.0
    • Fix Version/s: 1.2.0
    • Component/s: File Formats
    • Labels:
      None

      Description

      In current Hive implementation, timestamps are stored in UTC (converted from current timezone), based on original parquet timestamp spec.

      However, we find this is not compatibility with other tools, and after some investigation it is not the way of the other file formats, or even some databases (Hive Timestamp is more equivalent of 'timestamp without timezone' datatype).

      This is the first part of the fix, which will restore compatibility with parquet-timestamp files generated by external tools by skipping conversion on reading.

      Later fix will change the write path to not convert, and stop the read-conversion even for files written by Hive itself.

        Attachments

        1. parquet_external_time.parq
          0.2 kB
          Szehon Ho
        2. HIVE-9482.patch
          28 kB
          Szehon Ho
        3. HIVE-9482.patch
          28 kB
          Szehon Ho
        4. HIVE-9482.2.patch
          28 kB
          Szehon Ho

          Issue Links

            Activity

              People

              • Assignee:
                szehon Szehon Ho
                Reporter:
                szehon Szehon Ho
              • Votes:
                0 Vote for this issue
                Watchers:
                9 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: