Details

      Description

      This JIRA is to implement timestamp support in Parquet SerDe.

      1. HIVE-6394.2.patch
        28 kB
        Szehon Ho
      2. HIVE-6394.3.patch
        28 kB
        Szehon Ho
      3. HIVE-6394.4.patch
        28 kB
        Szehon Ho
      4. HIVE-6394.5.patch
        28 kB
        Szehon Ho
      5. HIVE-6394.6.patch
        29 kB
        Szehon Ho
      6. HIVE-6394.6.patch
        30 kB
        Szehon Ho
      7. HIVE-6394.7.patch
        29 kB
        Szehon Ho
      8. HIVE-6394.patch
        21 kB
        Szehon Ho

        Issue Links

          Activity

          Hide
          sandeep chaturvedi added a comment -

          hey guys.. is it something I can take a look at?

          Show
          sandeep chaturvedi added a comment - hey guys.. is it something I can take a look at?
          Hide
          Szehon Ho added a comment -

          I'll take a look at this issue, there has been a decision by the parquet community of the data type to use.

          https://github.com/Parquet/parquet-mr/issues/218

          Show
          Szehon Ho added a comment - I'll take a look at this issue, there has been a decision by the parquet community of the data type to use. https://github.com/Parquet/parquet-mr/issues/218
          Hide
          Szehon Ho added a comment -

          This is blocked by HIVE-6386 as the new Int96 data type and libraries are in new version of parquet.

          Show
          Szehon Ho added a comment - This is blocked by HIVE-6386 as the new Int96 data type and libraries are in new version of parquet.
          Hide
          Szehon Ho added a comment -

          Typo , it is HIVE-6836.

          Show
          Szehon Ho added a comment - Typo , it is HIVE-6836 .
          Hide
          Szehon Ho added a comment -

          We upgraded parquet to get the new Int96 libraries, but there is a parquet exception when writing an actual Int96 type, with dictionary encoding on.

          Filed https://github.com/Parquet/parquet-mr/issues/350 which is being worked on. Will need to wait for the fix + new version of parquet before we can proceed.

          Show
          Szehon Ho added a comment - We upgraded parquet to get the new Int96 libraries, but there is a parquet exception when writing an actual Int96 type, with dictionary encoding on. Filed https://github.com/Parquet/parquet-mr/issues/350 which is being worked on. Will need to wait for the fix + new version of parquet before we can proceed.
          Hide
          Szehon Ho added a comment -

          Fix has been pulled to parquet, but still waiting on parquet release with this fix. Manually built parquet with fix, to do implementation on hive side. Attaching as work-in-progress.

          Show
          Szehon Ho added a comment - Fix has been pulled to parquet, but still waiting on parquet release with this fix. Manually built parquet with fix, to do implementation on hive side. Attaching as work-in-progress.
          Hide
          Szehon Ho added a comment -

          Adding unit tests.

          Show
          Szehon Ho added a comment - Adding unit tests.
          Hide
          Andrew Ash added a comment -

          Szehon Ho it looks like Parquet v1.5.0 includes the fix for that blocking bug https://github.com/Parquet/parquet-mr/issues/350

          How is the work-in-progress coming?

          Also my apologies for all the emails you probably got as I linked together the various issues across Jira and GitHub.

          Show
          Andrew Ash added a comment - Szehon Ho it looks like Parquet v1.5.0 includes the fix for that blocking bug https://github.com/Parquet/parquet-mr/issues/350 How is the work-in-progress coming? Also my apologies for all the emails you probably got as I linked together the various issues across Jira and GitHub.
          Hide
          Szehon Ho added a comment -

          Hi, thanks for notifying me. This change was working, but now will probably need a rebase due to the parquet-decimal changes. I can take a look this week to submit the patch for review. But if its not straightforward I get to it only next week. Hope that is ok

          Show
          Szehon Ho added a comment - Hi, thanks for notifying me. This change was working, but now will probably need a rebase due to the parquet-decimal changes. I can take a look this week to submit the patch for review. But if its not straightforward I get to it only next week. Hope that is ok
          Hide
          Andrew Ash added a comment -

          It's not a huge rush for me, I just didn't want this to sit idle as I'm hoping to use Timestamps heavily in future versions of Hive. I highly appreciate all your work on this!

          Show
          Andrew Ash added a comment - It's not a huge rush for me, I just didn't want this to sit idle as I'm hoping to use Timestamps heavily in future versions of Hive. I highly appreciate all your work on this!
          Hide
          Szehon Ho added a comment -

          Rebased back to working test.

          This is a working cut is good to go, but for now I am putting the timestamp<->parquet-byte conversion functions in the code. I couldn't find any equivalent in joda library. I'm going to try the Jodd library in next cut.

          Show
          Szehon Ho added a comment - Rebased back to working test. This is a working cut is good to go, but for now I am putting the timestamp<->parquet-byte conversion functions in the code. I couldn't find any equivalent in joda library. I'm going to try the Jodd library in next cut.
          Hide
          Szehon Ho added a comment -

          First patch for review. Use the Jodd library.

          Show
          Szehon Ho added a comment - First patch for review. Use the Jodd library.
          Hide
          Hive QA added a comment -

          Overall: -1 at least one tests failed

          Here are the results of testing the latest attachment:
          https://issues.apache.org/jira/secure/attachment/12648057/HIVE-6394.4.patch

          ERROR: -1 due to 9 failed/errored test(s), 5514 tests executed
          Failed tests:

          org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_display_colstats_tbllvl
          org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_bucketmapjoin6
          org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_root_dir_external_table
          org.apache.hadoop.hive.cli.TestNegativeCliDriver.testNegativeCliDriver_authorization_ctas
          org.apache.hadoop.hive.cli.TestNegativeCliDriver.testNegativeCliDriver_parquet_timestamp
          org.apache.hadoop.hive.ql.exec.tez.TestTezTask.testSubmit
          org.apache.hive.hcatalog.pig.TestOrcHCatPigStorer.testWriteDecimal
          org.apache.hive.hcatalog.pig.TestOrcHCatPigStorer.testWriteDecimalX
          org.apache.hive.hcatalog.pig.TestOrcHCatPigStorer.testWriteDecimalXY
          

          Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-Build/377/testReport
          Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-Build/377/console
          Test logs: http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-Build-377/

          Messages:

          Executing org.apache.hive.ptest.execution.PrepPhase
          Executing org.apache.hive.ptest.execution.ExecutionPhase
          Executing org.apache.hive.ptest.execution.ReportingPhase
          Tests exited with: TestsFailedException: 9 tests failed
          

          This message is automatically generated.

          ATTACHMENT ID: 12648057

          Show
          Hive QA added a comment - Overall : -1 at least one tests failed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12648057/HIVE-6394.4.patch ERROR: -1 due to 9 failed/errored test(s), 5514 tests executed Failed tests: org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_display_colstats_tbllvl org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_bucketmapjoin6 org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_root_dir_external_table org.apache.hadoop.hive.cli.TestNegativeCliDriver.testNegativeCliDriver_authorization_ctas org.apache.hadoop.hive.cli.TestNegativeCliDriver.testNegativeCliDriver_parquet_timestamp org.apache.hadoop.hive.ql.exec.tez.TestTezTask.testSubmit org.apache.hive.hcatalog.pig.TestOrcHCatPigStorer.testWriteDecimal org.apache.hive.hcatalog.pig.TestOrcHCatPigStorer.testWriteDecimalX org.apache.hive.hcatalog.pig.TestOrcHCatPigStorer.testWriteDecimalXY Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-Build/377/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-Build/377/console Test logs: http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-Build-377/ Messages: Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 9 tests failed This message is automatically generated. ATTACHMENT ID: 12648057
          Hide
          Brock Noland added a comment -

          Getting a Calendar can be expensive. Is it thread safe? If so can you cache it?

          Show
          Brock Noland added a comment - Getting a Calendar can be expensive. Is it thread safe? If so can you cache it?
          Hide
          Szehon Ho added a comment -

          I don't think so, as I am modifying the values with the given timestamp. I added a thread-local cache of calendar that is lazily-created.

          Show
          Szehon Ho added a comment - I don't think so, as I am modifying the values with the given timestamp. I added a thread-local cache of calendar that is lazily-created.
          Hide
          Hive QA added a comment -

          Overall: -1 at least one tests failed

          Here are the results of testing the latest attachment:
          https://issues.apache.org/jira/secure/attachment/12648439/HIVE-6394.5.patch

          ERROR: -1 due to 16 failed/errored test(s), 5589 tests executed
          Failed tests:

          org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_orc_predicate_pushdown
          org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_load_dyn_part1
          org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_metadata_only_queries
          org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_ptf
          org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_scriptfile1
          org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_tez_schema_evolution
          org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_bucketmapjoin6
          org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_root_dir_external_table
          org.apache.hadoop.hive.cli.TestNegativeCliDriver.testNegativeCliDriver_authorization_ctas
          org.apache.hadoop.hive.cli.TestNegativeCliDriver.testNegativeCliDriver_parquet_timestamp
          org.apache.hadoop.hive.ql.exec.tez.TestTezTask.testSubmit
          org.apache.hadoop.hive.ql.security.TestStorageBasedMetastoreAuthorizationProvider.testSimplePrivileges
          org.apache.hive.hcatalog.pig.TestOrcHCatLoader.testReadDataPrimitiveTypes
          org.apache.hive.hcatalog.pig.TestOrcHCatPigStorer.testWriteDecimal
          org.apache.hive.hcatalog.pig.TestOrcHCatPigStorer.testWriteDecimalX
          org.apache.hive.hcatalog.pig.TestOrcHCatPigStorer.testWriteDecimalXY
          

          Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-Build/393/testReport
          Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-Build/393/console
          Test logs: http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-Build-393/

          Messages:

          Executing org.apache.hive.ptest.execution.PrepPhase
          Executing org.apache.hive.ptest.execution.ExecutionPhase
          Executing org.apache.hive.ptest.execution.ReportingPhase
          Tests exited with: TestsFailedException: 16 tests failed
          

          This message is automatically generated.

          ATTACHMENT ID: 12648439

          Show
          Hive QA added a comment - Overall : -1 at least one tests failed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12648439/HIVE-6394.5.patch ERROR: -1 due to 16 failed/errored test(s), 5589 tests executed Failed tests: org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_orc_predicate_pushdown org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_load_dyn_part1 org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_metadata_only_queries org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_ptf org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_scriptfile1 org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_tez_schema_evolution org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_bucketmapjoin6 org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_root_dir_external_table org.apache.hadoop.hive.cli.TestNegativeCliDriver.testNegativeCliDriver_authorization_ctas org.apache.hadoop.hive.cli.TestNegativeCliDriver.testNegativeCliDriver_parquet_timestamp org.apache.hadoop.hive.ql.exec.tez.TestTezTask.testSubmit org.apache.hadoop.hive.ql.security.TestStorageBasedMetastoreAuthorizationProvider.testSimplePrivileges org.apache.hive.hcatalog.pig.TestOrcHCatLoader.testReadDataPrimitiveTypes org.apache.hive.hcatalog.pig.TestOrcHCatPigStorer.testWriteDecimal org.apache.hive.hcatalog.pig.TestOrcHCatPigStorer.testWriteDecimalX org.apache.hive.hcatalog.pig.TestOrcHCatPigStorer.testWriteDecimalXY Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-Build/393/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-Build/393/console Test logs: http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-Build-393/ Messages: Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 16 tests failed This message is automatically generated. ATTACHMENT ID: 12648439
          Hide
          Szehon Ho added a comment -

          Attaching another patch. Was using a parquet-example class, now explicitly adding that logic in the serde layer.

          Show
          Szehon Ho added a comment - Attaching another patch. Was using a parquet-example class, now explicitly adding that logic in the serde layer.
          Hide
          Hive QA added a comment -

          Overall: -1 at least one tests failed

          Here are the results of testing the latest attachment:
          https://issues.apache.org/jira/secure/attachment/12648767/HIVE-6394.6.patch

          ERROR: -1 due to 13 failed/errored test(s), 5589 tests executed
          Failed tests:

          org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_dynpart_sort_optimization
          org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_load_dyn_part1
          org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_scriptfile1
          org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_tez_dml
          org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_root_dir_external_table
          org.apache.hadoop.hive.cli.TestNegativeCliDriver.testNegativeCliDriver_authorization_ctas
          org.apache.hadoop.hive.cli.TestNegativeCliDriver.testNegativeCliDriver_parquet_timestamp
          org.apache.hadoop.hive.metastore.TestMetastoreVersion.testDefaults
          org.apache.hive.hcatalog.pig.TestHCatLoader.testReadDataPrimitiveTypes
          org.apache.hive.hcatalog.pig.TestOrcHCatPigStorer.testWriteDecimal
          org.apache.hive.hcatalog.pig.TestOrcHCatPigStorer.testWriteDecimalX
          org.apache.hive.hcatalog.pig.TestOrcHCatPigStorer.testWriteDecimalXY
          org.apache.hive.hcatalog.templeton.tool.TestTempletonUtils.testPropertiesParsing
          

          Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-Build/404/testReport
          Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-Build/404/console
          Test logs: http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-Build-404/

          Messages:

          Executing org.apache.hive.ptest.execution.PrepPhase
          Executing org.apache.hive.ptest.execution.ExecutionPhase
          Executing org.apache.hive.ptest.execution.ReportingPhase
          Tests exited with: TestsFailedException: 13 tests failed
          

          This message is automatically generated.

          ATTACHMENT ID: 12648767

          Show
          Hive QA added a comment - Overall : -1 at least one tests failed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12648767/HIVE-6394.6.patch ERROR: -1 due to 13 failed/errored test(s), 5589 tests executed Failed tests: org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_dynpart_sort_optimization org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_load_dyn_part1 org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_scriptfile1 org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_tez_dml org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_root_dir_external_table org.apache.hadoop.hive.cli.TestNegativeCliDriver.testNegativeCliDriver_authorization_ctas org.apache.hadoop.hive.cli.TestNegativeCliDriver.testNegativeCliDriver_parquet_timestamp org.apache.hadoop.hive.metastore.TestMetastoreVersion.testDefaults org.apache.hive.hcatalog.pig.TestHCatLoader.testReadDataPrimitiveTypes org.apache.hive.hcatalog.pig.TestOrcHCatPigStorer.testWriteDecimal org.apache.hive.hcatalog.pig.TestOrcHCatPigStorer.testWriteDecimalX org.apache.hive.hcatalog.pig.TestOrcHCatPigStorer.testWriteDecimalXY org.apache.hive.hcatalog.templeton.tool.TestTempletonUtils.testPropertiesParsing Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-Build/404/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-Build/404/console Test logs: http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-Build-404/ Messages: Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 13 tests failed This message is automatically generated. ATTACHMENT ID: 12648767
          Hide
          Brock Noland added a comment -

          Szehon Ho I see parquet_timestamp failed.

          Show
          Brock Noland added a comment - Szehon Ho I see parquet_timestamp failed.
          Hide
          Szehon Ho added a comment -

          Test was asserting that parquet is not supporting timestamp type, removing it.

          Show
          Szehon Ho added a comment - Test was asserting that parquet is not supporting timestamp type, removing it.
          Hide
          Hive QA added a comment -

          Overall: -1 at least one tests failed

          Here are the results of testing the latest attachment:
          https://issues.apache.org/jira/secure/attachment/12649609/HIVE-6394.6.patch

          ERROR: -1 due to 8 failed/errored test(s), 5612 tests executed
          Failed tests:

          org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_dynpart_sort_optimization
          org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_load_dyn_part1
          org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_scriptfile1
          org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_tez_dml
          org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_root_dir_external_table
          org.apache.hadoop.hive.cli.TestNegativeCliDriver.testNegativeCliDriver_authorization_ctas
          org.apache.hive.hcatalog.pig.TestHCatLoader.testReadDataPrimitiveTypes
          org.apache.hive.hcatalog.templeton.tool.TestTempletonUtils.testPropertiesParsing
          

          Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-Build/431/testReport
          Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-Build/431/console
          Test logs: http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-Build-431/

          Messages:

          Executing org.apache.hive.ptest.execution.PrepPhase
          Executing org.apache.hive.ptest.execution.ExecutionPhase
          Executing org.apache.hive.ptest.execution.ReportingPhase
          Tests exited with: TestsFailedException: 8 tests failed
          

          This message is automatically generated.

          ATTACHMENT ID: 12649609

          Show
          Hive QA added a comment - Overall : -1 at least one tests failed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12649609/HIVE-6394.6.patch ERROR: -1 due to 8 failed/errored test(s), 5612 tests executed Failed tests: org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_dynpart_sort_optimization org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_load_dyn_part1 org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_scriptfile1 org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_tez_dml org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_root_dir_external_table org.apache.hadoop.hive.cli.TestNegativeCliDriver.testNegativeCliDriver_authorization_ctas org.apache.hive.hcatalog.pig.TestHCatLoader.testReadDataPrimitiveTypes org.apache.hive.hcatalog.templeton.tool.TestTempletonUtils.testPropertiesParsing Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-Build/431/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-Build/431/console Test logs: http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-Build-431/ Messages: Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 8 tests failed This message is automatically generated. ATTACHMENT ID: 12649609
          Hide
          Brock Noland added a comment -

          Tests appear to be unrelated. LGTM +1

          Show
          Brock Noland added a comment - Tests appear to be unrelated. LGTM +1
          Hide
          Szehon Ho added a comment -

          Rebase after Xuefu's commit

          Show
          Szehon Ho added a comment - Rebase after Xuefu's commit
          Hide
          Hive QA added a comment -

          Overall: -1 at least one tests failed

          Here are the results of testing the latest attachment:
          https://issues.apache.org/jira/secure/attachment/12650102/HIVE-6394.7.patch

          ERROR: -1 due to 6 failed/errored test(s), 5613 tests executed
          Failed tests:

          org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_load_dyn_part1
          org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_scriptfile1
          org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_root_dir_external_table
          org.apache.hadoop.hive.cli.TestNegativeCliDriver.testNegativeCliDriver_authorization_ctas
          org.apache.hive.hcatalog.templeton.tool.TestTempletonUtils.testPropertiesParsing
          org.apache.hive.jdbc.miniHS2.TestHiveServer2.testConnection
          

          Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-Build/455/testReport
          Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-Build/455/console
          Test logs: http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-Build-455/

          Messages:

          Executing org.apache.hive.ptest.execution.PrepPhase
          Executing org.apache.hive.ptest.execution.ExecutionPhase
          Executing org.apache.hive.ptest.execution.ReportingPhase
          Tests exited with: TestsFailedException: 6 tests failed
          

          This message is automatically generated.

          ATTACHMENT ID: 12650102

          Show
          Hive QA added a comment - Overall : -1 at least one tests failed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12650102/HIVE-6394.7.patch ERROR: -1 due to 6 failed/errored test(s), 5613 tests executed Failed tests: org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_load_dyn_part1 org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_scriptfile1 org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_root_dir_external_table org.apache.hadoop.hive.cli.TestNegativeCliDriver.testNegativeCliDriver_authorization_ctas org.apache.hive.hcatalog.templeton.tool.TestTempletonUtils.testPropertiesParsing org.apache.hive.jdbc.miniHS2.TestHiveServer2.testConnection Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-Build/455/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-Build/455/console Test logs: http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-Build-455/ Messages: Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 6 tests failed This message is automatically generated. ATTACHMENT ID: 12650102
          Hide
          Szehon Ho added a comment -

          Brock Noland Hi Brock, these test failures dont look related, can we commit this if you have the chance? Thanks

          Show
          Szehon Ho added a comment - Brock Noland Hi Brock, these test failures dont look related, can we commit this if you have the chance? Thanks
          Hide
          Brock Noland added a comment -

          +1

          Show
          Brock Noland added a comment - +1
          Hide
          Brock Noland added a comment -

          Thank you for the contribution! I have committed this to trunk.

          Show
          Brock Noland added a comment - Thank you for the contribution! I have committed this to trunk.
          Hide
          Lefty Leverenz added a comment -

          Document this for 0.14.0 here:

          Show
          Lefty Leverenz added a comment - Document this for 0.14.0 here: Language Manual – Parquet – Limitations
          Hide
          Szehon Ho added a comment -

          Lefty Leverenz Do we just need to remove 'timestamp' from the following sentence?

          Binary, timestamp, date, char, varchar or decimal support are pending (HIVE-6384)
          
          Show
          Szehon Ho added a comment - Lefty Leverenz Do we just need to remove 'timestamp' from the following sentence? Binary, timestamp, date, char, varchar or decimal support are pending (HIVE-6384)
          Hide
          Lefty Leverenz added a comment -

          Not quite, because 'timestamp' is still a limitation for releases prior to 0.14.

          I'll make a change and you can review it. (That'll be quicker than writing my suggestion here.)

          Show
          Lefty Leverenz added a comment - Not quite, because 'timestamp' is still a limitation for releases prior to 0.14. I'll make a change and you can review it. (That'll be quicker than writing my suggestion here.)
          Hide
          Lefty Leverenz added a comment -

          How's this? I added decimal too (HIVE-6367).

          Show
          Lefty Leverenz added a comment - How's this? I added decimal too ( HIVE-6367 ). Language Manual – Parquet – Limitations
          Hide
          Szehon Ho added a comment -

          Ah got it, thanks. Looks good, just one (unrelated) note, as HIVE-6375 is committed in 0.13, should we qualify the CTAS limitation?

          Show
          Szehon Ho added a comment - Ah got it, thanks. Looks good, just one (unrelated) note, as HIVE-6375 is committed in 0.13, should we qualify the CTAS limitation?
          Hide
          Lefty Leverenz added a comment -

          Yes, good catch. But apparently HIVE-6375 doesn't provide column rename support for Parquet – is there another JIRA ticket for that? (I'll edit the wiki and continue this discussion in HIVE-6375 comments.)

          Show
          Lefty Leverenz added a comment - Yes, good catch. But apparently HIVE-6375 doesn't provide column rename support for Parquet – is there another JIRA ticket for that? (I'll edit the wiki and continue this discussion in HIVE-6375 comments.)
          Hide
          Thejas M Nair added a comment -

          This has been fixed in 0.14 release. Please open new jira if you see any issues.

          Show
          Thejas M Nair added a comment - This has been fixed in 0.14 release. Please open new jira if you see any issues.
          Hide
          Yang Yang added a comment -

          the parquet spec about logical types and Timestamp specifically, seems to say
          https://github.com/Parquet/parquet-format/blob/master/LogicalTypes.md
          "TIMESTAMP_MILLIS is used for a combined logical date and time type. It must annotate an int64 that stores the number of milliseconds from the Unix epoch, 00:00:00.000 on 1 January 1970, UTC.

          "

          i.e. here it says that the type is only precise to the point of miliseconds and it starts from 1970.

          but if u look at the hive-parquet code in
          https://github.com/apache/hive/blob/branch-0.14/ql/src/java/org/apache/hadoop/hive/ql/io/parquet/convert/ETypeConverter.java#L142
          https://github.com/apache/hive/blob/branch-0.14/ql/src/java/org/apache/hadoop/hive/ql/io/parquet/timestamp/NanoTime.java#L54
          it seems that hive's encoding of timestamp on parquet is of a different spec, precise to the point of nano seconds, and starting from "Monday, January 1, 4713 " (defined in jodd.datetime.JDateTime)

          so Hive's parquet timestamp storage is completely different from the above spec ?

          Show
          Yang Yang added a comment - the parquet spec about logical types and Timestamp specifically, seems to say https://github.com/Parquet/parquet-format/blob/master/LogicalTypes.md "TIMESTAMP_MILLIS is used for a combined logical date and time type. It must annotate an int64 that stores the number of milliseconds from the Unix epoch, 00:00:00.000 on 1 January 1970, UTC. " i.e. here it says that the type is only precise to the point of miliseconds and it starts from 1970. but if u look at the hive-parquet code in https://github.com/apache/hive/blob/branch-0.14/ql/src/java/org/apache/hadoop/hive/ql/io/parquet/convert/ETypeConverter.java#L142 https://github.com/apache/hive/blob/branch-0.14/ql/src/java/org/apache/hadoop/hive/ql/io/parquet/timestamp/NanoTime.java#L54 it seems that hive's encoding of timestamp on parquet is of a different spec, precise to the point of nano seconds, and starting from "Monday, January 1, 4713 " (defined in jodd.datetime.JDateTime) so Hive's parquet timestamp storage is completely different from the above spec ?
          Hide
          Szehon Ho added a comment -

          Hi Yang, thanks for the observation. What you pointed is a type called 'timestamp_milis', whereas this is about 'timestamp', which has to have nanosecond precision.

          The spec at the time of implementation was based on this parquet discussion https://github.com/Parquet/parquet-mr/issues/218, as it was followed that way to get compatibility between Hive/Impala parquet timestamps.

          Now on the other hand, maybe parquet will soon come up with a proper Timestamp logical type, at that time the tools can change implementation to that one, although for now this works if you are using Hive/Impala.

          Show
          Szehon Ho added a comment - Hi Yang, thanks for the observation. What you pointed is a type called 'timestamp_milis', whereas this is about 'timestamp', which has to have nanosecond precision. The spec at the time of implementation was based on this parquet discussion https://github.com/Parquet/parquet-mr/issues/218 , as it was followed that way to get compatibility between Hive/Impala parquet timestamps. Now on the other hand, maybe parquet will soon come up with a proper Timestamp logical type, at that time the tools can change implementation to that one, although for now this works if you are using Hive/Impala.
          Hide
          Cheng Lian added a comment -

          While testing Spark SQL 1.5-SNAPSHOT for Parquet/Hive compatibility, we hit SPARK-10177. In short, Spark SQL and Hive both have their own Julian date conversion code, and their results don't match. Currently, we've fixed this issue by making Spark SQL behave the same as Hive so that we can interoperate (see Spark PR #8400). However, Hive's behavior looks a little bit weird to me: when converting a calendar timestamp to a Julian timestamp, Hive always gives a result 12 hours later than the expected result.

          This behavior can be verified by the following Spark 1.5-SNAPSHOT shell snippet (I'm using Spark 1.5-SNAPSHOT shell since it comes with Hive 1.2.1 dependencies):

          import java.sql._
          import java.util._
          
          import org.apache.hadoop.hive.ql.io.parquet.timestamp._
          import org.apache.spark.sql.catalyst.util._
          
          TimeZone.setDefault(TimeZone.getTimeZone("GMT"))
          val timestamp = Timestamp.valueOf("1970-00-00 00:00:00")
          
          val hiveNanoTime = NanoTimeUtils.getNanoTime(timestamp, false)
          val hiveJulianDay = hiveNanoTime.getJulianDay
          val hiveTimeOfDayNanos = hiveNanoTime.getTimeOfDayNanos
          
          println(
            s"""Hive converts "$timestamp" to Julian timestamp:
               |(julianDay=$hiveJulianDay, timeOfDayNanos=$hiveTimeOfDayNanos)
             """.stripMargin)
          

          The result is:

          Hive converts "1970-01-01 00:00:00.0" to Julian timestamp:
          (julianDay=2440588, timeOfDayNanos=0)
          

          According to definition on this page, Julian dates count from noon. Namely "00:00:00" of any calendar date must map to a Julian timestamp with a fraction of 0.5, i.e. an integral date plus 12 hours. And the correct Julian timestamp given by the converter in the aforementioned page is "2440587.500000", which is equivalent to:

          (julianDay=2440587, timeOfDayNanos=43200000000000)
          

          This means, INT96 timestamp values stored in Parquet files written by Hive all have a 12 hr offset. (I haven't tried to verify this issue against Impala.)

          This shouldn't a big problem though, as long as the read path always correctly decode written timestamp values. Just curious, is this 12 hr offset intentional?

          Show
          Cheng Lian added a comment - While testing Spark SQL 1.5-SNAPSHOT for Parquet/Hive compatibility, we hit SPARK-10177 . In short, Spark SQL and Hive both have their own Julian date conversion code, and their results don't match. Currently, we've fixed this issue by making Spark SQL behave the same as Hive so that we can interoperate (see Spark PR #8400 ). However, Hive's behavior looks a little bit weird to me: when converting a calendar timestamp to a Julian timestamp, Hive always gives a result 12 hours later than the expected result. This behavior can be verified by the following Spark 1.5-SNAPSHOT shell snippet (I'm using Spark 1.5-SNAPSHOT shell since it comes with Hive 1.2.1 dependencies): import java.sql._ import java.util._ import org.apache.hadoop.hive.ql.io.parquet.timestamp._ import org.apache.spark.sql.catalyst.util._ TimeZone.setDefault(TimeZone.getTimeZone( "GMT" )) val timestamp = Timestamp.valueOf( "1970-00-00 00:00:00" ) val hiveNanoTime = NanoTimeUtils.getNanoTime(timestamp, false ) val hiveJulianDay = hiveNanoTime.getJulianDay val hiveTimeOfDayNanos = hiveNanoTime.getTimeOfDayNanos println( s """Hive converts " $timestamp" to Julian timestamp: |(julianDay=$hiveJulianDay, timeOfDayNanos=$hiveTimeOfDayNanos) """.stripMargin) The result is: Hive converts "1970-01-01 00:00:00.0" to Julian timestamp: (julianDay=2440588, timeOfDayNanos=0) According to definition on this page , Julian dates count from noon. Namely "00:00:00" of any calendar date must map to a Julian timestamp with a fraction of 0.5, i.e. an integral date plus 12 hours. And the correct Julian timestamp given by the converter in the aforementioned page is "2440587.500000", which is equivalent to: (julianDay=2440587, timeOfDayNanos=43200000000000) This means, INT96 timestamp values stored in Parquet files written by Hive all have a 12 hr offset. (I haven't tried to verify this issue against Impala.) This shouldn't a big problem though, as long as the read path always correctly decode written timestamp values. Just curious, is this 12 hr offset intentional?

            People

            • Assignee:
              Szehon Ho
              Reporter:
              Jarek Jarcec Cecho
            • Votes:
              2 Vote for this issue
              Watchers:
              13 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development