Hive
  1. Hive
  2. HIVE-2223

support grouping on complex types in Hive

    Details

    • Type: New Feature New Feature
    • Status: Closed
    • Priority: Minor Minor
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 0.8.0
    • Component/s: Query Processor
    • Labels:
      None
    • Hadoop Flags:
      Reviewed

      Description

      Creating a query with a GROUP BY statement when an array type column is part of the column list is not yet supported:

      CREATE TABLE test_group_by ( key INT, group INT, terms ARRAY<STRING>);
      SELECT key, terms, count(group) FROM test_group_by GROUP BY key, terms;
      ...
      "Hash code on complex types not supported yet."

      java.lang.RuntimeException: Error while closing operators
      at org.apache.hadoop.hive.ql.exec.ExecMapper.close(ExecMapper.java:232)
      at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:57)
      at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:356)
      at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305)
      at org.apache.hadoop.mapred.Child.main(Child.java:170)
      Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: java.lang.RuntimeException: Hash code on complex types not supported yet.
      at org.apache.hadoop.hive.ql.exec.GroupByOperator.closeOp(GroupByOperator.java:799)
      at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:462)
      at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:470)
      at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:470)
      at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:470)
      at org.apache.hadoop.hive.ql.exec.ExecMapper.close(ExecMapper.java:211)
      ... 4 more
      Caused by: java.lang.RuntimeException: Hash code on complex types not supported yet.
      at org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorUtils.hashCode(ObjectInspectorUtils.java:348)
      at org.apache.hadoop.hive.ql.exec.ReduceSinkOperator.processOp(ReduceSinkOperator.java:187)
      at org.apache.hadoop.hive.ql.exec.Operator.process(Operator.java:386)
      at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:598)
      at org.apache.hadoop.hive.ql.exec.GroupByOperator.forward(GroupByOperator.java:746)
      at org.apache.hadoop.hive.ql.exec.GroupByOperator.closeOp(GroupByOperator.java:780)
      ... 9 more

      1. HIVE-2223.patch
        22 kB
        Jonathan Chang
      2. HIVE-2223.patch.2
        22 kB
        Jonathan Chang
      3. HIVE-2223.patch.3
        23 kB
        Jonathan Chang

        Activity

        Hide
        Hudson added a comment -

        Integrated in Hive-trunk-h0.21 #1016 (See https://builds.apache.org/job/Hive-trunk-h0.21/1016/)
        HIVE-2223. support grouping on complex types in Hive
        (Jonathan Chang via jvs)

        jvs : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1185311
        Files :

        • /hive/trunk/ql/src/test/queries/clientpositive/groupby_complex_types.q
        • /hive/trunk/ql/src/test/results/clientpositive/groupby_complex_types.q.out
        • /hive/trunk/serde/src/java/org/apache/hadoop/hive/serde2/objectinspector/ObjectInspectorUtils.java
        • /hive/trunk/serde/src/test/org/apache/hadoop/hive/serde2/objectinspector/TestStandardObjectInspectors.java
        Show
        Hudson added a comment - Integrated in Hive-trunk-h0.21 #1016 (See https://builds.apache.org/job/Hive-trunk-h0.21/1016/ ) HIVE-2223 . support grouping on complex types in Hive (Jonathan Chang via jvs) jvs : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1185311 Files : /hive/trunk/ql/src/test/queries/clientpositive/groupby_complex_types.q /hive/trunk/ql/src/test/results/clientpositive/groupby_complex_types.q.out /hive/trunk/serde/src/java/org/apache/hadoop/hive/serde2/objectinspector/ObjectInspectorUtils.java /hive/trunk/serde/src/test/org/apache/hadoop/hive/serde2/objectinspector/TestStandardObjectInspectors.java
        Hide
        John Sichi added a comment -

        Committed to trunk. Thanks Jonathan!

        Show
        John Sichi added a comment - Committed to trunk. Thanks Jonathan!
        Hide
        jiraposter@reviews.apache.org added a comment -

        -----------------------------------------------------------
        This is an automatically generated e-mail. To reply, visit:
        https://reviews.apache.org/r/2381/
        -----------------------------------------------------------

        (Updated 2011-10-16 17:19:25.946728)

        Review request for hive.

        Changes
        -------

        Unittest fix.

        Summary
        -------

        Adds hash codes for List and Map object inspectors.

        This addresses bug HIVE-2223.
        https://issues.apache.org/jira/browse/HIVE-2223

        Diffs (updated)


        ql/src/test/queries/clientpositive/groupby_complex_types.q PRE-CREATION
        ql/src/test/results/clientpositive/groupby_complex_types.q.out PRE-CREATION
        serde/src/java/org/apache/hadoop/hive/serde2/objectinspector/ObjectInspectorUtils.java 2d45aba
        serde/src/test/org/apache/hadoop/hive/serde2/objectinspector/TestStandardObjectInspectors.java c1b1932

        Diff: https://reviews.apache.org/r/2381/diff

        Testing
        -------

        Added unittest.

        Thanks,

        Jonathan

        Show
        jiraposter@reviews.apache.org added a comment - ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/2381/ ----------------------------------------------------------- (Updated 2011-10-16 17:19:25.946728) Review request for hive. Changes ------- Unittest fix. Summary ------- Adds hash codes for List and Map object inspectors. This addresses bug HIVE-2223 . https://issues.apache.org/jira/browse/HIVE-2223 Diffs (updated) ql/src/test/queries/clientpositive/groupby_complex_types.q PRE-CREATION ql/src/test/results/clientpositive/groupby_complex_types.q.out PRE-CREATION serde/src/java/org/apache/hadoop/hive/serde2/objectinspector/ObjectInspectorUtils.java 2d45aba serde/src/test/org/apache/hadoop/hive/serde2/objectinspector/TestStandardObjectInspectors.java c1b1932 Diff: https://reviews.apache.org/r/2381/diff Testing ------- Added unittest. Thanks, Jonathan
        Hide
        Jonathan Chang added a comment -

        Fix unittest

        Show
        Jonathan Chang added a comment - Fix unittest
        Hide
        John Sichi added a comment -

        One test failed: TestStandardObjectInspectors.testStandardUnionObjectInspector

        Show
        John Sichi added a comment - One test failed: TestStandardObjectInspectors.testStandardUnionObjectInspector
        Hide
        John Sichi added a comment -

        +1. Will commit when tests pass.

        Show
        John Sichi added a comment - +1. Will commit when tests pass.
        Hide
        jiraposter@reviews.apache.org added a comment -

        -----------------------------------------------------------
        This is an automatically generated e-mail. To reply, visit:
        https://reviews.apache.org/r/2381/
        -----------------------------------------------------------

        (Updated 2011-10-14 18:38:00.199892)

        Review request for hive.

        Changes
        -------

        Make hash match algorithms used by Java.

        Summary
        -------

        Adds hash codes for List and Map object inspectors.

        This addresses bug HIVE-2223.
        https://issues.apache.org/jira/browse/HIVE-2223

        Diffs (updated)


        ql/src/test/queries/clientpositive/groupby_complex_types.q PRE-CREATION
        ql/src/test/results/clientpositive/groupby_complex_types.q.out PRE-CREATION
        serde/src/java/org/apache/hadoop/hive/serde2/objectinspector/ObjectInspectorUtils.java 2d45aba

        Diff: https://reviews.apache.org/r/2381/diff

        Testing
        -------

        Added unittest.

        Thanks,

        Jonathan

        Show
        jiraposter@reviews.apache.org added a comment - ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/2381/ ----------------------------------------------------------- (Updated 2011-10-14 18:38:00.199892) Review request for hive. Changes ------- Make hash match algorithms used by Java. Summary ------- Adds hash codes for List and Map object inspectors. This addresses bug HIVE-2223 . https://issues.apache.org/jira/browse/HIVE-2223 Diffs (updated) ql/src/test/queries/clientpositive/groupby_complex_types.q PRE-CREATION ql/src/test/results/clientpositive/groupby_complex_types.q.out PRE-CREATION serde/src/java/org/apache/hadoop/hive/serde2/objectinspector/ObjectInspectorUtils.java 2d45aba Diff: https://reviews.apache.org/r/2381/diff Testing ------- Added unittest. Thanks, Jonathan
        Hide
        John Sichi added a comment -

        See comments in review board regarding the hash codes.

        Show
        John Sichi added a comment - See comments in review board regarding the hash codes.
        Hide
        jiraposter@reviews.apache.org added a comment -

        -----------------------------------------------------------
        This is an automatically generated e-mail. To reply, visit:
        https://reviews.apache.org/r/2381/#review2595
        -----------------------------------------------------------

        serde/src/java/org/apache/hadoop/hive/serde2/objectinspector/ObjectInspectorUtils.java
        <https://reviews.apache.org/r/2381/#comment5833>

        Since this is a list, sequence is significant, e.g. [1,2,3] != [3,2,1]. So perhaps we should make the hash code reflect this?

        Java's AbstractList does this via:

        while (i.hasNext())

        { E obj = i.next(); hashCode = 31*hashCode + (obj==null ? 0 : obj.hashCode()); }

        serde/src/java/org/apache/hadoop/hive/serde2/objectinspector/ObjectInspectorUtils.java
        <https://reviews.apache.org/r/2381/#comment5834>

        For java.util.HashMap, they xor the key's hashcode with the value's hashcode for each entry, and then sum over all entries. I suppose that's to distinguish

        {a->b, c->d}

        from

        {a->d, b->c}

        .

        although it fails to distinguish

        {a->b}

        from

        {b->a}
        • John

        On 2011-10-13 16:50:53, Jonathan Chang wrote:

        -----------------------------------------------------------

        This is an automatically generated e-mail. To reply, visit:

        https://reviews.apache.org/r/2381/

        -----------------------------------------------------------

        (Updated 2011-10-13 16:50:53)

        Review request for hive.

        Summary

        -------

        Adds hash codes for List and Map object inspectors.

        This addresses bug HIVE-2223.

        https://issues.apache.org/jira/browse/HIVE-2223

        Diffs

        -----

        ql/src/test/queries/clientpositive/groupby_complex_types.q PRE-CREATION

        ql/src/test/results/clientpositive/groupby_complex_types.q.out PRE-CREATION

        serde/src/java/org/apache/hadoop/hive/serde2/objectinspector/ObjectInspectorUtils.java 2d45aba

        Diff: https://reviews.apache.org/r/2381/diff

        Testing

        -------

        Added unittest.

        Thanks,

        Jonathan

        Show
        jiraposter@reviews.apache.org added a comment - ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/2381/#review2595 ----------------------------------------------------------- serde/src/java/org/apache/hadoop/hive/serde2/objectinspector/ObjectInspectorUtils.java < https://reviews.apache.org/r/2381/#comment5833 > Since this is a list, sequence is significant, e.g. [1,2,3] != [3,2,1] . So perhaps we should make the hash code reflect this? Java's AbstractList does this via: while (i.hasNext()) { E obj = i.next(); hashCode = 31*hashCode + (obj==null ? 0 : obj.hashCode()); } serde/src/java/org/apache/hadoop/hive/serde2/objectinspector/ObjectInspectorUtils.java < https://reviews.apache.org/r/2381/#comment5834 > For java.util.HashMap, they xor the key's hashcode with the value's hashcode for each entry, and then sum over all entries. I suppose that's to distinguish {a->b, c->d} from {a->d, b->c} . although it fails to distinguish {a->b} from {b->a} John On 2011-10-13 16:50:53, Jonathan Chang wrote: ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/2381/ ----------------------------------------------------------- (Updated 2011-10-13 16:50:53) Review request for hive. Summary ------- Adds hash codes for List and Map object inspectors. This addresses bug HIVE-2223 . https://issues.apache.org/jira/browse/HIVE-2223 Diffs ----- ql/src/test/queries/clientpositive/groupby_complex_types.q PRE-CREATION ql/src/test/results/clientpositive/groupby_complex_types.q.out PRE-CREATION serde/src/java/org/apache/hadoop/hive/serde2/objectinspector/ObjectInspectorUtils.java 2d45aba Diff: https://reviews.apache.org/r/2381/diff Testing ------- Added unittest. Thanks, Jonathan
        Hide
        jiraposter@reviews.apache.org added a comment -

        -----------------------------------------------------------
        This is an automatically generated e-mail. To reply, visit:
        https://reviews.apache.org/r/2381/
        -----------------------------------------------------------

        Review request for hive.

        Summary
        -------

        Adds hash codes for List and Map object inspectors.

        This addresses bug HIVE-2223.
        https://issues.apache.org/jira/browse/HIVE-2223

        Diffs


        ql/src/test/queries/clientpositive/groupby_complex_types.q PRE-CREATION
        ql/src/test/results/clientpositive/groupby_complex_types.q.out PRE-CREATION
        serde/src/java/org/apache/hadoop/hive/serde2/objectinspector/ObjectInspectorUtils.java 2d45aba

        Diff: https://reviews.apache.org/r/2381/diff

        Testing
        -------

        Added unittest.

        Thanks,

        Jonathan

        Show
        jiraposter@reviews.apache.org added a comment - ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/2381/ ----------------------------------------------------------- Review request for hive. Summary ------- Adds hash codes for List and Map object inspectors. This addresses bug HIVE-2223 . https://issues.apache.org/jira/browse/HIVE-2223 Diffs ql/src/test/queries/clientpositive/groupby_complex_types.q PRE-CREATION ql/src/test/results/clientpositive/groupby_complex_types.q.out PRE-CREATION serde/src/java/org/apache/hadoop/hive/serde2/objectinspector/ObjectInspectorUtils.java 2d45aba Diff: https://reviews.apache.org/r/2381/diff Testing ------- Added unittest. Thanks, Jonathan
        Hide
        Jonathan Chang added a comment -

        Ah, no. I tried various things for the root ('.', '/', 'hive', etc.) but never 'hive-git'.

        Show
        Jonathan Chang added a comment - Ah, no. I tried various things for the root ('.', '/', 'hive', etc.) but never 'hive-git'.
        Hide
        John Sichi added a comment -

        It applies cleanly for me, but I was also able to upload it to Review Board successfully. Did you try choosing hive-git for the repository?

        Show
        John Sichi added a comment - It applies cleanly for me, but I was also able to upload it to Review Board successfully. Did you try choosing hive-git for the repository?
        Hide
        Jonathan Chang added a comment -

        For the life of me I can't get review board to accept my diff. Can you try the patch attached to this JIRA and see if it applies cleanly for you?

        Show
        Jonathan Chang added a comment - For the life of me I can't get review board to accept my diff. Can you try the patch attached to this JIRA and see if it applies cleanly for you?
        Hide
        John Sichi added a comment -

        I can't seem to view the diff on Review Board?

        Show
        John Sichi added a comment - I can't seem to view the diff on Review Board?
        Hide
        jiraposter@reviews.apache.org added a comment -

        -----------------------------------------------------------
        This is an automatically generated e-mail. To reply, visit:
        https://reviews.apache.org/r/1745/
        -----------------------------------------------------------

        (Updated 2011-09-09 13:50:30.594056)

        Review request for hive.

        Summary
        -------

        Adds hash codes for List and Map object inspectors.

        This addresses bug HIVE-2223.
        https://issues.apache.org/jira/browse/HIVE-2223

        Diffs


        Diff: https://reviews.apache.org/r/1745/diff

        Testing
        -------

        Added unittest.

        Thanks,

        Jonathan

        Show
        jiraposter@reviews.apache.org added a comment - ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/1745/ ----------------------------------------------------------- (Updated 2011-09-09 13:50:30.594056) Review request for hive. Summary ------- Adds hash codes for List and Map object inspectors. This addresses bug HIVE-2223 . https://issues.apache.org/jira/browse/HIVE-2223 Diffs Diff: https://reviews.apache.org/r/1745/diff Testing ------- Added unittest. Thanks, Jonathan
        Hide
        John Sichi added a comment -

        Jonathan, fill in the bug field in Review Board with HIVE-2223 so that the comments from there will automatically get propagated here.

        Show
        John Sichi added a comment - Jonathan, fill in the bug field in Review Board with HIVE-2223 so that the comments from there will automatically get propagated here.
        Show
        Jonathan Chang added a comment - https://reviews.apache.org/r/1745/
        Hide
        Ning Zhang added a comment -

        Jonathan, can you create a review board request at reviews.apache.org

        Show
        Ning Zhang added a comment - Jonathan, can you create a review board request at reviews.apache.org
        Hide
        Jonathan Chang added a comment -

        This adds hashing functions for MAP and LIST object inspectors, and changes the default map comparator to non-null. These changes allow one to group by complex types.

        Show
        Jonathan Chang added a comment - This adds hashing functions for MAP and LIST object inspectors, and changes the default map comparator to non-null. These changes allow one to group by complex types.
        Hide
        Jonathan Chang added a comment -

        I'll work on getting this to work for arrays and maps since those are the most common cases. I'll leave STRUCT and UNION up for grabs.

        Show
        Jonathan Chang added a comment - I'll work on getting this to work for arrays and maps since those are the most common cases. I'll leave STRUCT and UNION up for grabs.

          People

          • Assignee:
            Jonathan Chang
            Reporter:
            Kathleen Ting
          • Votes:
            1 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development