Uploaded image for project: 'Hive'
  1. Hive
  2. HIVE-9482

Hive parquet timestamp compatibility

    Details

    • Type: Bug
    • Status: Closed
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: 0.15.0
    • Fix Version/s: 1.2.0
    • Component/s: File Formats
    • Labels:
      None

      Description

      In current Hive implementation, timestamps are stored in UTC (converted from current timezone), based on original parquet timestamp spec.

      However, we find this is not compatibility with other tools, and after some investigation it is not the way of the other file formats, or even some databases (Hive Timestamp is more equivalent of 'timestamp without timezone' datatype).

      This is the first part of the fix, which will restore compatibility with parquet-timestamp files generated by external tools by skipping conversion on reading.

      Later fix will change the write path to not convert, and stop the read-conversion even for files written by Hive itself.

      1. HIVE-9482.2.patch
        28 kB
        Szehon Ho
      2. HIVE-9482.patch
        28 kB
        Szehon Ho
      3. HIVE-9482.patch
        28 kB
        Szehon Ho
      4. parquet_external_time.parq
        0.2 kB
        Szehon Ho

        Issue Links

          Activity

          Hide
          szehon Szehon Ho added a comment -

          Attaching the new data file which is binary and cannot be displayed in the patch. This should go in /data/files

          Show
          szehon Szehon Ho added a comment - Attaching the new data file which is binary and cannot be displayed in the patch. This should go in /data/files
          Hide
          szehon Szehon Ho added a comment -

          Attaching again to trigger test

          Show
          szehon Szehon Ho added a comment - Attaching again to trigger test
          Hide
          szehon Szehon Ho added a comment -

          Address review comments.

          Show
          szehon Szehon Ho added a comment - Address review comments.
          Hide
          hiveqa Hive QA added a comment -

          Overall: -1 at least one tests failed

          Here are the results of testing the latest attachment:
          https://issues.apache.org/jira/secure/attachment/12695072/HIVE-9482.2.patch

          ERROR: -1 due to 3 failed/errored test(s), 7406 tests executed
          Failed tests:

          org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_parquet_external_time
          org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_join38
          org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_subquery_in
          

          Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/2554/testReport
          Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/2554/console
          Test logs: http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-2554/

          Messages:

          Executing org.apache.hive.ptest.execution.PrepPhase
          Executing org.apache.hive.ptest.execution.ExecutionPhase
          Executing org.apache.hive.ptest.execution.ReportingPhase
          Tests exited with: TestsFailedException: 3 tests failed
          

          This message is automatically generated.

          ATTACHMENT ID: 12695072 - PreCommit-HIVE-TRUNK-Build

          Show
          hiveqa Hive QA added a comment - Overall : -1 at least one tests failed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12695072/HIVE-9482.2.patch ERROR: -1 due to 3 failed/errored test(s), 7406 tests executed Failed tests: org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_parquet_external_time org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_join38 org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_subquery_in Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/2554/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/2554/console Test logs: http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-2554/ Messages: Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 3 tests failed This message is automatically generated. ATTACHMENT ID: 12695072 - PreCommit-HIVE-TRUNK-Build
          Hide
          szehon Szehon Ho added a comment -

          Test failures dont look related (these spark tests also failed in other builds).

          parquet_external_time will fail until the attached parquet file is checked in (/data/files/parquet_external_time.parq).

          Show
          szehon Szehon Ho added a comment - Test failures dont look related (these spark tests also failed in other builds). parquet_external_time will fail until the attached parquet file is checked in (/data/files/parquet_external_time.parq).
          Hide
          brocknoland Brock Noland added a comment -

          +1

          Show
          brocknoland Brock Noland added a comment - +1
          Hide
          szehon Szehon Ho added a comment -

          Committed to trunk. Thanks Brock for review.

          Show
          szehon Szehon Ho added a comment - Committed to trunk. Thanks Brock for review.
          Hide
          szehon Szehon Ho added a comment -

          Adds property "hive.parquet.timestamp.skip.conversion", which needs to be documented.

          Show
          szehon Szehon Ho added a comment - Adds property "hive.parquet.timestamp.skip.conversion", which needs to be documented.
          Hide
          szehon Szehon Ho added a comment -

          Done, added new section for Parquet and mention this property.

          Show
          szehon Szehon Ho added a comment - Done, added new section for Parquet and mention this property.
          Hide
          leftylev Lefty Leverenz added a comment -

          Got your back, Szehon Ho – I changed the version number from 0.14.0 to 1.2.0 (probably a copy-&-paste error).

          Show
          leftylev Lefty Leverenz added a comment - Got your back, Szehon Ho – I changed the version number from 0.14.0 to 1.2.0 (probably a copy-&-paste error). hive.parquet.timestamp.skip.conversion
          Hide
          sushanth Sushanth Sowmyan added a comment -

          This issue has been fixed and released as part of the 1.2.0 release. If you find an issue which seems to be related to this one, please create a new jira and link this one with new jira.

          Show
          sushanth Sushanth Sowmyan added a comment - This issue has been fixed and released as part of the 1.2.0 release. If you find an issue which seems to be related to this one, please create a new jira and link this one with new jira.
          Hide
          lirui Rui Li added a comment -

          Hi Szehon Ho, is there a follow on task for the write path?

          Show
          lirui Rui Li added a comment - Hi Szehon Ho , is there a follow on task for the write path?
          Hide
          vitalii Vitalii Diravka added a comment -

          Why this hive.parquet.timestamp.skip.conversion option is enabled by default?
          Since according parquet spec, parquet files don't keep local timezone. And we cann't distinguish from file what was the value of that option while parquet file was generating.

          Show
          vitalii Vitalii Diravka added a comment - Why this hive.parquet.timestamp.skip.conversion option is enabled by default? Since according parquet spec , parquet files don't keep local timezone. And we cann't distinguish from file what was the value of that option while parquet file was generating.

            People

            • Assignee:
              szehon Szehon Ho
              Reporter:
              szehon Szehon Ho
            • Votes:
              0 Vote for this issue
              Watchers:
              9 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development