Hive
  1. Hive
  2. HIVE-1271

Case sensitiveness of type information specified when using custom reducer causes type mismatch

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: 0.5.0
    • Fix Version/s: 0.6.0
    • Component/s: Query Processor
    • Labels:
      None

      Description

      Type information specified while using a custom reduce script is converted to lower case, and causes type mismatch during query semantic analysis . The following REDUCE query where field name = "userId" failed.

      hive> CREATE TABLE SS (
      > a INT,
      > b INT,
      > vals ARRAY<STRUCT<userId:INT, y:STRING>>
      > );
      OK

      hive> FROM (select * from srcTable DISTRIBUTE BY id SORT BY id) s
      > INSERT OVERWRITE TABLE SS
      > REDUCE *
      > USING 'myreduce.py'
      > AS
      > (a INT,
      > b INT,
      > vals ARRAY<STRUCT<userId:INT, y:STRING>>
      > )
      > ;
      FAILED: Error in semantic analysis: line 2:27 Cannot insert into
      target table because column number/types are different SS: Cannot
      convert column 2 from array<struct<userId:int,y:string>> to
      array<struct<userid:int,y:string>>.

      The same query worked fine after changing "userId" to "userid".

      1. HIVE-1271.patch
        6 kB
        Arvind Prabhakar
      2. HIVE-1271-1.patch
        8 kB
        Arvind Prabhakar

        Activity

        Dilip Joseph created issue -
        Arvind Prabhakar made changes -
        Field Original Value New Value
        Assignee Arvind Prabhakar [ aprabhakar ]
        Arvind Prabhakar made changes -
        Attachment HIVE-1271.patch [ 12439946 ]
        Hide
        Arvind Prabhakar added a comment -

        Summary
        The implementation of equals() method of StructTypeInfo was comparing field names as part of the comparison. This is not valid since field namess do not contitute the definition of a type. This patch refactors the TypedInfo hierarchy to address this issue.

        Implementation Details

        • Modified the TypedInfo and removed its implementation of the equals() method.
        • Modified all specialized subclasses to make them final.
        • Modified all subclass implementation of equals() to skip category comparison.
        • Modified StructTypeInfo implementation of equals() to not compare field names.

        Testing Done

        • Built and tested the usecase identified in this issue. It works now.
        • Ran full set of tests. Out of these two tests - clientpositive for input20.q and input33.q failed for unrelated reasons (these tests are failing on the trunk as well). All other tests passed.
        Show
        Arvind Prabhakar added a comment - Summary The implementation of equals() method of StructTypeInfo was comparing field names as part of the comparison. This is not valid since field namess do not contitute the definition of a type. This patch refactors the TypedInfo hierarchy to address this issue. Implementation Details Modified the TypedInfo and removed its implementation of the equals() method. Modified all specialized subclasses to make them final . Modified all subclass implementation of equals() to skip category comparison. Modified StructTypeInfo implementation of equals() to not compare field names. Testing Done Built and tested the usecase identified in this issue. It works now. Ran full set of tests. Out of these two tests - clientpositive for input20.q and input33.q failed for unrelated reasons (these tests are failing on the trunk as well). All other tests passed.
        Arvind Prabhakar made changes -
        Status Open [ 1 ] Patch Available [ 10002 ]
        Arvind Prabhakar made changes -
        Attachment HIVE-1271-1.patch [ 12440030 ]
        Hide
        Arvind Prabhakar added a comment -

        Changes for HIVE-1271 (patch updated)

        Summary:
        The previously submitted patch removed the dependence of StructTypeInfo on field names for equivalence comparison. This patch reverts that change and addresses the type equivalence by canonical testing of field names.

        Details:
        The changes to TypeInfo hierarchy made by previous patch assumed that the field names should not be considered part of the StructTypeInfo for testing equivalence. This conflicts with the implementation of LazyBinarySerDe (and others perhaps) which rely on field name distinction for caching purposes. This update changes the implementation so that field names are used as before, but are compared using case-insensitive comparison when testing the equivalence of two {{StructTypeInfo}}s.

        Testing Done:

        • Built and tested the usecase identified in this issue - it works now.
        • Ran complete set of tests with the previously reported unrelated failures only.
        Show
        Arvind Prabhakar added a comment - Changes for HIVE-1271 (patch updated) Summary: The previously submitted patch removed the dependence of StructTypeInfo on field names for equivalence comparison. This patch reverts that change and addresses the type equivalence by canonical testing of field names. Details: The changes to TypeInfo hierarchy made by previous patch assumed that the field names should not be considered part of the StructTypeInfo for testing equivalence. This conflicts with the implementation of LazyBinarySerDe (and others perhaps) which rely on field name distinction for caching purposes. This update changes the implementation so that field names are used as before, but are compared using case-insensitive comparison when testing the equivalence of two {{StructTypeInfo}}s. Testing Done: Built and tested the usecase identified in this issue - it works now. Ran complete set of tests with the previously reported unrelated failures only.
        John Sichi made changes -
        Fix Version/s 0.6.0 [ 12314524 ]
        Hide
        HBase Review Board added a comment -

        Message from: "Carl Steinbach" <carl@cloudera.com>

        -----------------------------------------------------------
        This is an automatically generated e-mail. To reply, visit:
        http://review.hbase.org/r/200/
        -----------------------------------------------------------

        Review request for Hive Developers.

        Summary
        -------

        Review for https://issues.apache.org/jira/secure/attachment/12440030/HIVE-1271-1.patch

        This addresses bug HIVE-1271.
        http://issues.apache.org/jira/browse/HIVE-1271

        Diffs


        serde/src/java/org/apache/hadoop/hive/serde2/typeinfo/ListTypeInfo.java cb2fa57
        serde/src/java/org/apache/hadoop/hive/serde2/typeinfo/MapTypeInfo.java a426e74
        serde/src/java/org/apache/hadoop/hive/serde2/typeinfo/PrimitiveTypeInfo.java 3d1c68e
        serde/src/java/org/apache/hadoop/hive/serde2/typeinfo/StructTypeInfo.java 87179aa
        serde/src/java/org/apache/hadoop/hive/serde2/typeinfo/TypeInfo.java 0344718
        serde/src/java/org/apache/hadoop/hive/serde2/typeinfo/package-info.java PRE-CREATION

        Diff: http://review.hbase.org/r/200/diff

        Testing
        -------

        Thanks,

        Carl

        Show
        HBase Review Board added a comment - Message from: "Carl Steinbach" <carl@cloudera.com> ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: http://review.hbase.org/r/200/ ----------------------------------------------------------- Review request for Hive Developers. Summary ------- Review for https://issues.apache.org/jira/secure/attachment/12440030/HIVE-1271-1.patch This addresses bug HIVE-1271 . http://issues.apache.org/jira/browse/HIVE-1271 Diffs serde/src/java/org/apache/hadoop/hive/serde2/typeinfo/ListTypeInfo.java cb2fa57 serde/src/java/org/apache/hadoop/hive/serde2/typeinfo/MapTypeInfo.java a426e74 serde/src/java/org/apache/hadoop/hive/serde2/typeinfo/PrimitiveTypeInfo.java 3d1c68e serde/src/java/org/apache/hadoop/hive/serde2/typeinfo/StructTypeInfo.java 87179aa serde/src/java/org/apache/hadoop/hive/serde2/typeinfo/TypeInfo.java 0344718 serde/src/java/org/apache/hadoop/hive/serde2/typeinfo/package-info.java PRE-CREATION Diff: http://review.hbase.org/r/200/diff Testing ------- Thanks, Carl
        Hide
        Arvind Prabhakar added a comment -

        Is anyone reviewing this change? Thanks.

        Show
        Arvind Prabhakar added a comment - Is anyone reviewing this change? Thanks.
        Hide
        Ashish Thusoo added a comment -

        I am looking at this.

        Show
        Ashish Thusoo added a comment - I am looking at this.
        Hide
        Ashish Thusoo added a comment -

        Looks good to me. However, why remove the check on Category? Also why drop the default implementation of the equals method for TypeInfo?

        Show
        Ashish Thusoo added a comment - Looks good to me. However, why remove the check on Category? Also why drop the default implementation of the equals method for TypeInfo?
        Hide
        Arvind Prabhakar added a comment -

        @Ashish: Thanks for looking at the patch.

        why remove the check on Category?

        I modified all the specialized type infos to be final - which in turn ensures that if the test on instanceof succeeds, then they have to be the same category type. Therefore, the check on category was redundant going forward.

        Also why drop the default implementation of the equals method for TypeInfo?

        I did this for two main reasons - first that fact that it was implementing the equals() but not hashCode() method. This could lead to unexpected behavior when TypeInfo instances were put in collections. Second, the implementation was modified to make both equals() and hashCode() methods to be made abstract in order to force any (new) child classes to make sure that they implement both consistently.

        Let me know if you would like to tweak this change as necessary.

        Show
        Arvind Prabhakar added a comment - @Ashish: Thanks for looking at the patch. why remove the check on Category? I modified all the specialized type infos to be final - which in turn ensures that if the test on instanceof succeeds, then they have to be the same category type. Therefore, the check on category was redundant going forward. Also why drop the default implementation of the equals method for TypeInfo? I did this for two main reasons - first that fact that it was implementing the equals() but not hashCode() method. This could lead to unexpected behavior when TypeInfo instances were put in collections. Second, the implementation was modified to make both equals() and hashCode() methods to be made abstract in order to force any (new) child classes to make sure that they implement both consistently. Let me know if you would like to tweak this change as necessary.
        Hide
        Ashish Thusoo added a comment -

        sounds good to me. Thanks for the explanations.

        +1. Will commit after running the tests.

        Ashish

        Show
        Ashish Thusoo added a comment - sounds good to me. Thanks for the explanations. +1. Will commit after running the tests. Ashish
        Hide
        Ashish Thusoo added a comment -

        I have committed this to trunk and will commit to 0.6.0 soon. One thing I did overlook though. We should add a test case for this. Can you do that as part of another JIRA as this one is already partially committed.

        Thanks,
        Ashish

        Show
        Ashish Thusoo added a comment - I have committed this to trunk and will commit to 0.6.0 soon. One thing I did overlook though. We should add a test case for this. Can you do that as part of another JIRA as this one is already partially committed. Thanks, Ashish
        Hide
        Arvind Prabhakar added a comment -

        @Ashish: I created HIVE-1432 to track the test case creation. I will be submitting a patch for that soon. Thanks for pointing this out.

        Show
        Arvind Prabhakar added a comment - @Ashish: I created HIVE-1432 to track the test case creation. I will be submitting a patch for that soon. Thanks for pointing this out.
        Hide
        Ashish Thusoo added a comment -

        Committed. Thanks Arvind!

        Show
        Ashish Thusoo added a comment - Committed. Thanks Arvind!
        Ashish Thusoo made changes -
        Status Patch Available [ 10002 ] Resolved [ 5 ]
        Resolution Fixed [ 1 ]
        Hide
        Zheng Shao added a comment -

        I might be too late for the party, but I have a question on removing the field name comparison for struct type info.

        We have 3 choices:
        C1: Compare field names case sensitively.
        C2: Compare field names case insensitively.
        C3: Don't compare field names at all.

        The old implementation was following C1, and the new one is following C3.
        Is there any reason that we don't do C2? C2 seems to provide some minimal sanity checks that users will need in practice.

        Show
        Zheng Shao added a comment - I might be too late for the party, but I have a question on removing the field name comparison for struct type info. We have 3 choices: C1: Compare field names case sensitively. C2: Compare field names case insensitively. C3: Don't compare field names at all. The old implementation was following C1, and the new one is following C3. Is there any reason that we don't do C2? C2 seems to provide some minimal sanity checks that users will need in practice.
        Hide
        Arvind Prabhakar added a comment -

        @Zheng: Please see the second comment. This patch uses C2 method - comparing field names in a case insensitive manner.

        Show
        Arvind Prabhakar added a comment - @Zheng: Please see the second comment. This patch uses C2 method - comparing field names in a case insensitive manner.
        Carl Steinbach made changes -
        Status Resolved [ 5 ] Closed [ 6 ]

          People

          • Assignee:
            Arvind Prabhakar
            Reporter:
            Dilip Joseph
          • Votes:
            1 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development