Hive
  1. Hive
  2. HIVE-2390

Add UNIONTYPE serialization support to LazyBinarySerDe

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: 0.13.1
    • Fix Version/s: 0.14.0
    • Component/s: None
    • Labels:
    • Hadoop Flags:
      Reviewed
    • Release Note:
      Adds UnionType support in LazyBinarySerde

      Description

      When the union type was introduced, full support for it wasn't provided. For instance, when working with a union that gets passed to LazyBinarySerde:

      Caused by: java.lang.RuntimeException: Unrecognized type: UNION
      	at org.apache.hadoop.hive.serde2.lazybinary.LazyBinarySerDe.serialize(LazyBinarySerDe.java:468)
      	at org.apache.hadoop.hive.serde2.lazybinary.LazyBinarySerDe.serializeStruct(LazyBinarySerDe.java:230)
      	at org.apache.hadoop.hive.serde2.lazybinary.LazyBinarySerDe.serialize(LazyBinarySerDe.java:184)
      
      1. HIVE-2390.1.patch
        29 kB
        Suma Shivaprasad
      2. HIVE-2390.patch
        22 kB
        Suma Shivaprasad

        Issue Links

          Activity

          Hide
          Jakob Homan added a comment -

          Part of the problem is that the term union has been overloaded. In SQL it means the actual set union of two compatible data types, whereas in Avro and programming languages it means one value that can be at any one time an instance of two or different types. Union was added as a full-on, first-class type by its inclusion in ObjectInspector's Category enum. Is there any reason not to expand this use to be more along the line of programming language's take on unions? If so, it should be marked as not really being a first-class type. If not, support for unions in all the serdes, in the grammar and in the documentation should be provided.

          I would lobby for expanding its support as it's an important type in Avro and we're quite hobbled by the inability to manipulate unioned values. (Avro handles nullable values by unioning them with their type T and null, but Haivvreo transparently converts these just to the type and returns null where appropriate. The problem lies in actual unions of non-null types, which are less frequent but still valid.)

          Show
          Jakob Homan added a comment - Part of the problem is that the term union has been overloaded. In SQL it means the actual set union of two compatible data types, whereas in Avro and programming languages it means one value that can be at any one time an instance of two or different types. Union was added as a full-on, first-class type by its inclusion in ObjectInspector's Category enum. Is there any reason not to expand this use to be more along the line of programming language's take on unions? If so, it should be marked as not really being a first-class type. If not, support for unions in all the serdes, in the grammar and in the documentation should be provided. I would lobby for expanding its support as it's an important type in Avro and we're quite hobbled by the inability to manipulate unioned values. (Avro handles nullable values by unioning them with their type T and null, but Haivvreo transparently converts these just to the type and returns null where appropriate. The problem lies in actual unions of non-null types, which are less frequent but still valid.)
          Hide
          Jakob Homan added a comment -

          Changing name of JIRA to be more representative of what needs to be done. If reaction is positive, will open subtasks for individual items.

          Show
          Jakob Homan added a comment - Changing name of JIRA to be more representative of what needs to be done. If reaction is positive, will open subtasks for individual items.
          Hide
          Amareshwari Sriramadasu added a comment -

          +1. I agree that when Union type was added, complete support for it was not added. We should extend its usage in all the serdes.

          Part of the problem is that the term union has been overloaded.

          The type is called 'uniontype' in Hive to resolve ambiguities.

          Show
          Amareshwari Sriramadasu added a comment - +1. I agree that when Union type was added, complete support for it was not added. We should extend its usage in all the serdes. Part of the problem is that the term union has been overloaded. The type is called 'uniontype' in Hive to resolve ambiguities.
          Hide
          Navis added a comment -

          HIVE-4765 included LazyBinaryUnion type. Could you check that?

          Show
          Navis added a comment - HIVE-4765 included LazyBinaryUnion type. Could you check that?
          Hide
          chewie added a comment -

          I wanted to see about the current status, and if there are any ETAs for resolution? I can assure there are quite a few efforts needing to qualify on data within uniontypes in Hive (Impala, etc), as soon as possible. I've been informed my effort will not accept uniontype usage (with more than one non-null type) unless there is built-in Hive support (which is very unfortunate, but not without point)... meaning the types have to be split into separate fields, which obviously is less semantically correct, more clunky (in the Avro model and Java), and provides no benefit other than a workaround for clean query ability.

          Something else that needs addressed is how to reference nested fields / structs / etc in the query. Currently '.' (period) is used, can this be kept for union? Ambiguity can arise if more than one type has the same field, in all other cases it can be implicitly unambiguous. This could actually be validated before query execution. When more than one type could have the same field, what would the syntax be? Possibly:

          unionobject.object.[2]unionobject.unionobject.[1]unionobject.object.....
          

          In the above example, any ambiguous object types being reference can be qualified by the int value of the type in square brackets [].

          Show
          chewie added a comment - I wanted to see about the current status, and if there are any ETAs for resolution? I can assure there are quite a few efforts needing to qualify on data within uniontypes in Hive (Impala, etc), as soon as possible. I've been informed my effort will not accept uniontype usage (with more than one non-null type) unless there is built-in Hive support (which is very unfortunate, but not without point)... meaning the types have to be split into separate fields, which obviously is less semantically correct, more clunky (in the Avro model and Java), and provides no benefit other than a workaround for clean query ability. Something else that needs addressed is how to reference nested fields / structs / etc in the query. Currently '.' (period) is used, can this be kept for union? Ambiguity can arise if more than one type has the same field, in all other cases it can be implicitly unambiguous. This could actually be validated before query execution. When more than one type could have the same field, what would the syntax be? Possibly: unionobject.object.[2]unionobject.unionobject.[1]unionobject.object..... In the above example, any ambiguous object types being reference can be qualified by the int value of the type in square brackets [].
          Hide
          Suma Shivaprasad added a comment -

          I have a patch for this ready. Will be submitting this shortly.

          Show
          Suma Shivaprasad added a comment - I have a patch for this ready. Will be submitting this shortly.
          Hide
          Hive QA added a comment -

          Overall: -1 at least one tests failed

          Here are the results of testing the latest attachment:
          https://issues.apache.org/jira/secure/attachment/12666766/HIVE-2390.patch

          ERROR: -1 due to 2 failed/errored test(s), 6171 tests executed
          Failed tests:

          org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_input_lazyserde
          org.apache.hive.jdbc.miniHS2.TestHiveServer2.testConnection
          

          Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/660/testReport
          Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/660/console
          Test logs: http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-660/

          Messages:

          Executing org.apache.hive.ptest.execution.PrepPhase
          Executing org.apache.hive.ptest.execution.ExecutionPhase
          Executing org.apache.hive.ptest.execution.ReportingPhase
          Tests exited with: TestsFailedException: 2 tests failed
          

          This message is automatically generated.

          ATTACHMENT ID: 12666766

          Show
          Hive QA added a comment - Overall : -1 at least one tests failed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12666766/HIVE-2390.patch ERROR: -1 due to 2 failed/errored test(s), 6171 tests executed Failed tests: org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_input_lazyserde org.apache.hive.jdbc.miniHS2.TestHiveServer2.testConnection Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/660/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/660/console Test logs: http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-660/ Messages: Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 2 tests failed This message is automatically generated. ATTACHMENT ID: 12666766
          Hide
          Hive QA added a comment -

          Overall: -1 at least one tests failed

          Here are the results of testing the latest attachment:
          https://issues.apache.org/jira/secure/attachment/12667013/HIVE-2390.1.patch

          ERROR: -1 due to 1 failed/errored test(s), 6184 tests executed
          Failed tests:

          org.apache.hadoop.hive.cli.TestHBaseCliDriver.testCliDriver_hbase_handler_snapshot
          

          Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/674/testReport
          Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/674/console
          Test logs: http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-674/

          Messages:

          Executing org.apache.hive.ptest.execution.PrepPhase
          Executing org.apache.hive.ptest.execution.ExecutionPhase
          Executing org.apache.hive.ptest.execution.ReportingPhase
          Tests exited with: TestsFailedException: 1 tests failed
          

          This message is automatically generated.

          ATTACHMENT ID: 12667013

          Show
          Hive QA added a comment - Overall : -1 at least one tests failed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12667013/HIVE-2390.1.patch ERROR: -1 due to 1 failed/errored test(s), 6184 tests executed Failed tests: org.apache.hadoop.hive.cli.TestHBaseCliDriver.testCliDriver_hbase_handler_snapshot Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/674/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/674/console Test logs: http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-674/ Messages: Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 1 tests failed This message is automatically generated. ATTACHMENT ID: 12667013
          Hide
          Amareshwari Sriramadasu added a comment -

          +1 Changes look fine to me.

          Suma Shivaprasad, Test failure seems unrelated to me. Can you look into and confirm?

          Show
          Amareshwari Sriramadasu added a comment - +1 Changes look fine to me. Suma Shivaprasad , Test failure seems unrelated to me. Can you look into and confirm?
          Hide
          Suma Shivaprasad added a comment -

          Amareshwari Sriramadasu Yes Test Case failure is unrelated to the patch

          Show
          Suma Shivaprasad added a comment - Amareshwari Sriramadasu Yes Test Case failure is unrelated to the patch
          Hide
          Thejas M Nair added a comment -

          Suma Shivaprasad Can you please add information to the release notes section (click on edit jira to find it), that can be used to document the change from this jira in wiki ?

          Show
          Thejas M Nair added a comment - Suma Shivaprasad Can you please add information to the release notes section (click on edit jira to find it), that can be used to document the change from this jira in wiki ?
          Hide
          Amareshwari Sriramadasu added a comment -

          I just committed this. Thanks Suma!

          Show
          Amareshwari Sriramadasu added a comment - I just committed this. Thanks Suma!
          Hide
          Carl Steinbach added a comment -

          I updated the description of this ticket to accurately reflect the change that was made in this patch.

          My impression is that this patch doesn't really change the situation in Hive with respect to UNIONTYPEs – this feature is still unusable. If I'm wrong about this I would appreciate someone setting me straight.

          Show
          Carl Steinbach added a comment - I updated the description of this ticket to accurately reflect the change that was made in this patch. My impression is that this patch doesn't really change the situation in Hive with respect to UNIONTYPEs – this feature is still unusable. If I'm wrong about this I would appreciate someone setting me straight.
          Hide
          Suma Shivaprasad added a comment -

          Carl,

          I am working on a related feature to support UNIONTYPE in ThriftDeserializer as well.
          Since I am a fairly new contributor to Hive and not aware of the existing issues in UNIONTYPE feature, if someone could identify the missing pieces and raise jiras, i can take a stab at it.

          Show
          Suma Shivaprasad added a comment - Carl, I am working on a related feature to support UNIONTYPE in ThriftDeserializer as well. Since I am a fairly new contributor to Hive and not aware of the existing issues in UNIONTYPE feature, if someone could identify the missing pieces and raise jiras, i can take a stab at it.
          Hide
          Thejas M Nair added a comment -

          This has been fixed in 0.14 release. Please open new jira if you see any issues.

          Show
          Thejas M Nair added a comment - This has been fixed in 0.14 release. Please open new jira if you see any issues.

            People

            • Assignee:
              Suma Shivaprasad
              Reporter:
              Jakob Homan
            • Votes:
              7 Vote for this issue
              Watchers:
              15 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development