Hive
  1. Hive
  2. HIVE-5814

Add DATE, TIMESTAMP, DECIMAL, CHAR, VARCHAR types support in HCat

    Details

    • Type: New Feature New Feature
    • Status: Resolved
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: 0.12.0
    • Fix Version/s: 0.13.0
    • Component/s: HCatalog
    • Labels:
      None

      Description

      Hive 0.12 added support for new data types. Pig 0.12 added some as well. HCat should handle these as well. Also note that CHAR was added recently.

      Also allow user to specify a parameter in Pig like so HCatStorer('','', '-onOutOfRangeValue Throw') to control what happens when Pig's value is out of range for target Hive column. Valid values for the option are Throw and Null. Throw - make the runtime raise an exception, Null, which is the default, means NULL is written to target column and a message to that effect is emitted to the log. Only 1 message per column/data type is sent to the log.

      See attached HCat-Pig Type Mapping Hive 0.13.pdf for exact mappings.

      1. HIVE-5814.5.patch
        141 kB
        Eugene Koifman
      2. HIVE-5814.4.patch
        140 kB
        Eugene Koifman
      3. HIVE-5814.3.patch
        140 kB
        Eugene Koifman
      4. HIVE-5814.2.patch
        110 kB
        Eugene Koifman
      5. HCat-Pig Type Mapping Hive 0.13.pdf
        48 kB
        Eugene Koifman

        Issue Links

          Activity

          Hide
          Eugene Koifman added a comment -

          Lefty Leverenz The feature is complete but the doc changes are still needed.

          Show
          Eugene Koifman added a comment - Lefty Leverenz The feature is complete but the doc changes are still needed.
          Hide
          Lefty Leverenz added a comment -

          Also, we should take on a documentation task to update the HCat wiki with the Out-of-range semantics.

          Has this been done yet? It doesn't seem to be included in HIVE-6316 (Document support for new types in HCat).

          Show
          Lefty Leverenz added a comment - Also, we should take on a documentation task to update the HCat wiki with the Out-of-range semantics. Has this been done yet? It doesn't seem to be included in HIVE-6316 (Document support for new types in HCat).
          Hide
          Sushanth Sowmyan added a comment -

          Committed to trunk. Thanks, Eugene!

          Show
          Sushanth Sowmyan added a comment - Committed to trunk. Thanks, Eugene!
          Hide
          Sushanth Sowmyan added a comment -

          I see.

          Yes, agreed, using toString() and valueOf as-is should be unacceptable, and we should modify LazySimpleSerDe/etc to that effect (not needed in this jira, let's not bundle still more changes here)

          Show
          Sushanth Sowmyan added a comment - I see. Yes, agreed, using toString() and valueOf as-is should be unacceptable, and we should modify LazySimpleSerDe/etc to that effect (not needed in this jira, let's not bundle still more changes here)
          Hide
          Eugene Koifman added a comment -

          java.sql.Date works like this.
          1. d = new Date(System.currentTimeMillis()
          2. d.toString() - prints out human readable date, but implicitly takes into account local timezone
          3. Date.valueOf("2014-01-01") calculates millis value again using local timezone.
          4. our LazySerde serializes Date using toString/valueOf which in my opinion is totally wrong. Thus if operation in 1 is performed at 5AM UTC time (Apr 2nd for example) but hive is running in Palo Alto, it will save the date which is 1 day before, i.e. Apr 1st. It should just store the millis value (perhaps making sure to chop off the 'time' part if any, i.e. so that it represents midnight since Epoch) and read it back the same way. Then we'd have 'absolute' notion of date.

          Show
          Eugene Koifman added a comment - java.sql.Date works like this. 1. d = new Date(System.currentTimeMillis() 2. d.toString() - prints out human readable date, but implicitly takes into account local timezone 3. Date.valueOf("2014-01-01") calculates millis value again using local timezone. 4. our LazySerde serializes Date using toString/valueOf which in my opinion is totally wrong. Thus if operation in 1 is performed at 5AM UTC time (Apr 2nd for example) but hive is running in Palo Alto, it will save the date which is 1 day before, i.e. Apr 1st. It should just store the millis value (perhaps making sure to chop off the 'time' part if any, i.e. so that it represents midnight since Epoch) and read it back the same way. Then we'd have 'absolute' notion of date.
          Hide
          Sushanth Sowmyan added a comment -

          Also, I would suggest editing the original description of this jira to indicate that HIVE-6232 was rolled into this patch as well, it'll help others that are jira trawling later on.

          Show
          Sushanth Sowmyan added a comment - Also, I would suggest editing the original description of this jira to indicate that HIVE-6232 was rolled into this patch as well, it'll help others that are jira trawling later on.
          Hide
          Sushanth Sowmyan added a comment -

          I've reviewed HIVE-5814.5.patch as a diff on the original HIVE-5814.patch that I reviewed last week, and it looks good to me.

          I must admit to being slightly nervous about the comments you made about java.sql.Date being weird, having not experimented with it myself, and would like to understand what other corner cases exist, and whether we want to make any changes to other serdes about this.

          Also, we should take on a documentation task to update the HCat wiki with the Out-of-range semantics.

          For this patch itself, though, I'm +1.

          Show
          Sushanth Sowmyan added a comment - I've reviewed HIVE-5814 .5.patch as a diff on the original HIVE-5814 .patch that I reviewed last week, and it looks good to me. I must admit to being slightly nervous about the comments you made about java.sql.Date being weird, having not experimented with it myself, and would like to understand what other corner cases exist, and whether we want to make any changes to other serdes about this. Also, we should take on a documentation task to update the HCat wiki with the Out-of-range semantics. For this patch itself, though, I'm +1.
          Hide
          Eugene Koifman added a comment -

          this failure is not related to any HCat changes

          Show
          Eugene Koifman added a comment - this failure is not related to any HCat changes
          Hide
          Hive QA added a comment -

          Overall: -1 at least one tests failed

          Here are the results of testing the latest attachment:
          https://issues.apache.org/jira/secure/attachment/12626383/HIVE-5814.5.patch

          ERROR: -1 due to 1 failed/errored test(s), 5000 tests executed
          Failed tests:

          org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_auto_sortmerge_join_16
          

          Test results: http://bigtop01.cloudera.org:8080/job/PreCommit-HIVE-Build/1136/testReport
          Console output: http://bigtop01.cloudera.org:8080/job/PreCommit-HIVE-Build/1136/console

          Messages:

          Executing org.apache.hive.ptest.execution.PrepPhase
          Executing org.apache.hive.ptest.execution.ExecutionPhase
          Executing org.apache.hive.ptest.execution.ReportingPhase
          Tests exited with: TestsFailedException: 1 tests failed
          

          This message is automatically generated.

          ATTACHMENT ID: 12626383

          Show
          Hive QA added a comment - Overall : -1 at least one tests failed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12626383/HIVE-5814.5.patch ERROR: -1 due to 1 failed/errored test(s), 5000 tests executed Failed tests: org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_auto_sortmerge_join_16 Test results: http://bigtop01.cloudera.org:8080/job/PreCommit-HIVE-Build/1136/testReport Console output: http://bigtop01.cloudera.org:8080/job/PreCommit-HIVE-Build/1136/console Messages: Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 1 tests failed This message is automatically generated. ATTACHMENT ID: 12626383
          Hide
          Eugene Koifman added a comment -

          HIVE-5814.5.patch - include TestHCatLoaderStorer change

          Show
          Eugene Koifman added a comment - HIVE-5814 .5.patch - include TestHCatLoaderStorer change
          Hide
          Eugene Koifman added a comment -

          Also included are a bunch of tests around out-of-range values from Pig

          Show
          Eugene Koifman added a comment - Also included are a bunch of tests around out-of-range values from Pig
          Hide
          Eugene Koifman added a comment -
          Show
          Eugene Koifman added a comment - https://reviews.apache.org/r/17135/diff/1-2/ has the changes
          Hide
          Eugene Koifman added a comment -

          HIVE-5814.4.patch adds ability for user to specify what happens if Pig value is out of range for target column in Hive.
          Here is an example:
          store data into 'test_tbl' using org.apache.hive.hcatalog.pig.HCatStorer('','','-onOutOfRangeValue Throw');

          Valid values are Throw and Null (default).

          Show
          Eugene Koifman added a comment - HIVE-5814 .4.patch adds ability for user to specify what happens if Pig value is out of range for target column in Hive. Here is an example: store data into 'test_tbl' using org.apache.hive.hcatalog.pig.HCatStorer('','','-onOutOfRangeValue Throw'); Valid values are Throw and Null (default).
          Hide
          Eugene Koifman added a comment -

          Updated spec

          Show
          Eugene Koifman added a comment - Updated spec
          Hide
          Hive QA added a comment -

          Overall: -1 at least one tests failed

          Here are the results of testing the latest attachment:
          https://issues.apache.org/jira/secure/attachment/12625486/HIVE-5814.3.patch

          ERROR: -1 due to 6 failed/errored test(s), 4968 tests executed
          Failed tests:

          org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_auto_sortmerge_join_16
          org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_import_exported_table
          org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_infer_bucket_sort_reducers_power_two
          org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_load_hdfs_file_with_space_in_the_name
          org.apache.hadoop.hive.cli.TestNegativeMinimrCliDriver.testNegativeCliDriver_file_with_header_footer_negative
          org.apache.hadoop.hive.cli.TestNegativeMinimrCliDriver.testNegativeCliDriver_mapreduce_stack_trace_hadoop20
          

          Test results: http://bigtop01.cloudera.org:8080/job/PreCommit-HIVE-Build/1060/testReport
          Console output: http://bigtop01.cloudera.org:8080/job/PreCommit-HIVE-Build/1060/console

          Messages:

          Executing org.apache.hive.ptest.execution.PrepPhase
          Executing org.apache.hive.ptest.execution.ExecutionPhase
          Executing org.apache.hive.ptest.execution.ReportingPhase
          Tests exited with: TestsFailedException: 6 tests failed
          

          This message is automatically generated.

          ATTACHMENT ID: 12625486

          Show
          Hive QA added a comment - Overall : -1 at least one tests failed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12625486/HIVE-5814.3.patch ERROR: -1 due to 6 failed/errored test(s), 4968 tests executed Failed tests: org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_auto_sortmerge_join_16 org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_import_exported_table org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_infer_bucket_sort_reducers_power_two org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_load_hdfs_file_with_space_in_the_name org.apache.hadoop.hive.cli.TestNegativeMinimrCliDriver.testNegativeCliDriver_file_with_header_footer_negative org.apache.hadoop.hive.cli.TestNegativeMinimrCliDriver.testNegativeCliDriver_mapreduce_stack_trace_hadoop20 Test results: http://bigtop01.cloudera.org:8080/job/PreCommit-HIVE-Build/1060/testReport Console output: http://bigtop01.cloudera.org:8080/job/PreCommit-HIVE-Build/1060/console Messages: Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 6 tests failed This message is automatically generated. ATTACHMENT ID: 12625486
          Hide
          Eugene Koifman added a comment -

          HIVE-5814.3.patch rebased with current trunk

          Show
          Eugene Koifman added a comment - HIVE-5814 .3.patch rebased with current trunk
          Hide
          Eugene Koifman added a comment -

          same patch, diff name; trying to get pre commit test to run

          Show
          Eugene Koifman added a comment - same patch, diff name; trying to get pre commit test to run
          Hide
          Sushanth Sowmyan added a comment -

          The patch looks good from HCat's perspective. It does not apply cleanly on trunk right now, but from the intent, I'm good on this. If you can update the patch and set it to patch available again (and if the precommit tests are working again), we can get the precommit tests running on this.

          Show
          Sushanth Sowmyan added a comment - The patch looks good from HCat's perspective. It does not apply cleanly on trunk right now, but from the intent, I'm good on this. If you can update the patch and set it to patch available again (and if the precommit tests are working again), we can get the precommit tests running on this.
          Hide
          Eugene Koifman added a comment -
          Show
          Eugene Koifman added a comment - Review Board: https://reviews.apache.org/r/17135
          Hide
          Eugene Koifman added a comment - - edited

          partial implementation

          Show
          Eugene Koifman added a comment - - edited partial implementation

            People

            • Assignee:
              Eugene Koifman
              Reporter:
              Eugene Koifman
            • Votes:
              1 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development