Details

    • Type: Improvement Improvement
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: 0.10.0
    • Fix Version/s: 0.10.0
    • Component/s: None
    • Labels:
      None
    • Hadoop Flags:
      Incompatible change
    • Release Note:
      Hive handling of enum types has been changed to always return the string value rather than struct<value:int> as the new format is more useful to users.

      Description

      When using serde-reported schemas with the ThriftDeserializer, Enum fields are presented as struct<value:int>

      Many users expect to work with the string values, which is both easier and more meaningful as the string value communicates what is represented.

      Hive should provide a mechanism to optionally convert enum values to strings.

      1. HIVE-3323_enum_to_string.1.patch
        2 kB
        Travis Crawford
      2. HIVE-3323_enum_to_string.2.patch
        33 kB
        Travis Crawford
      3. HIVE-3323_enum_to_string.3.patch
        61 kB
        Travis Crawford
      4. HIVE-3323_enum_to_string.4.patch
        59 kB
        Feng Peng
      5. HIVE-3323_enum_to_string.5.patch
        127 kB
        Travis Crawford
      6. HIVE-3323_enum_to_string.6.patch
        126 kB
        Travis Crawford
      7. HIVE-3323_enum_to_string.8.patch
        114 kB
        Travis Crawford

        Issue Links

          Activity

          Hide
          Jie Li added a comment -

          From the discussion I can see the proposal of "hive.data.convert.enum.to.string" but it was not in the final patch.....

          Show
          Jie Li added a comment - From the discussion I can see the proposal of "hive.data.convert.enum.to.string" but it was not in the final patch.....
          Hide
          Jie Li added a comment -

          Hi, we used to use the struct<value:int> representation in Hive 0.8 and got hit by this incompatible change after upgrading. Any reason not providing a configuration for this change? I guess it might break some other users.

          Show
          Jie Li added a comment - Hi, we used to use the struct<value:int> representation in Hive 0.8 and got hit by this incompatible change after upgrading. Any reason not providing a configuration for this change? I guess it might break some other users.
          Hide
          Ashutosh Chauhan added a comment -

          This issue is fixed and released as part of 0.10.0 release. If you find an issue which seems to be related to this one, please create a new jira and link this one with new jira.

          Show
          Ashutosh Chauhan added a comment - This issue is fixed and released as part of 0.10.0 release. If you find an issue which seems to be related to this one, please create a new jira and link this one with new jira.
          Hide
          Hudson added a comment -

          Integrated in Hive-trunk-hadoop2 #54 (See https://builds.apache.org/job/Hive-trunk-hadoop2/54/)
          HIVE-3323 : enum to string conversions (Travis Crawford via Ashutosh Chauhan) (Revision 1382211)

          Result = ABORTED
          hashutosh : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1382211
          Files :

          • /hive/trunk/ql/src/test/queries/clientpositive/convert_enum_to_string.q
          • /hive/trunk/ql/src/test/results/clientpositive/convert_enum_to_string.q.out
          • /hive/trunk/serde/if/test/megastruct.thrift
          • /hive/trunk/serde/src/gen/thrift/gen-javabean/org/apache/hadoop/hive/serde2/thrift/test/MegaStruct.java
          • /hive/trunk/serde/src/gen/thrift/gen-javabean/org/apache/hadoop/hive/serde2/thrift/test/MiniStruct.java
          • /hive/trunk/serde/src/gen/thrift/gen-javabean/org/apache/hadoop/hive/serde2/thrift/test/MyEnum.java
          • /hive/trunk/serde/src/java/org/apache/hadoop/hive/serde2/objectinspector/ObjectInspectorFactory.java
          • /hive/trunk/serde/src/java/org/apache/hadoop/hive/serde2/objectinspector/primitive/JavaStringObjectInspector.java
          Show
          Hudson added a comment - Integrated in Hive-trunk-hadoop2 #54 (See https://builds.apache.org/job/Hive-trunk-hadoop2/54/ ) HIVE-3323 : enum to string conversions (Travis Crawford via Ashutosh Chauhan) (Revision 1382211) Result = ABORTED hashutosh : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1382211 Files : /hive/trunk/ql/src/test/queries/clientpositive/convert_enum_to_string.q /hive/trunk/ql/src/test/results/clientpositive/convert_enum_to_string.q.out /hive/trunk/serde/if/test/megastruct.thrift /hive/trunk/serde/src/gen/thrift/gen-javabean/org/apache/hadoop/hive/serde2/thrift/test/MegaStruct.java /hive/trunk/serde/src/gen/thrift/gen-javabean/org/apache/hadoop/hive/serde2/thrift/test/MiniStruct.java /hive/trunk/serde/src/gen/thrift/gen-javabean/org/apache/hadoop/hive/serde2/thrift/test/MyEnum.java /hive/trunk/serde/src/java/org/apache/hadoop/hive/serde2/objectinspector/ObjectInspectorFactory.java /hive/trunk/serde/src/java/org/apache/hadoop/hive/serde2/objectinspector/primitive/JavaStringObjectInspector.java
          Hide
          Hudson added a comment -

          Integrated in Hive-trunk-h0.21 #1656 (See https://builds.apache.org/job/Hive-trunk-h0.21/1656/)
          HIVE-3323 : enum to string conversions (Travis Crawford via Ashutosh Chauhan) (Revision 1382211)

          Result = FAILURE
          hashutosh : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1382211
          Files :

          • /hive/trunk/ql/src/test/queries/clientpositive/convert_enum_to_string.q
          • /hive/trunk/ql/src/test/results/clientpositive/convert_enum_to_string.q.out
          • /hive/trunk/serde/if/test/megastruct.thrift
          • /hive/trunk/serde/src/gen/thrift/gen-javabean/org/apache/hadoop/hive/serde2/thrift/test/MegaStruct.java
          • /hive/trunk/serde/src/gen/thrift/gen-javabean/org/apache/hadoop/hive/serde2/thrift/test/MiniStruct.java
          • /hive/trunk/serde/src/gen/thrift/gen-javabean/org/apache/hadoop/hive/serde2/thrift/test/MyEnum.java
          • /hive/trunk/serde/src/java/org/apache/hadoop/hive/serde2/objectinspector/ObjectInspectorFactory.java
          • /hive/trunk/serde/src/java/org/apache/hadoop/hive/serde2/objectinspector/primitive/JavaStringObjectInspector.java
          Show
          Hudson added a comment - Integrated in Hive-trunk-h0.21 #1656 (See https://builds.apache.org/job/Hive-trunk-h0.21/1656/ ) HIVE-3323 : enum to string conversions (Travis Crawford via Ashutosh Chauhan) (Revision 1382211) Result = FAILURE hashutosh : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1382211 Files : /hive/trunk/ql/src/test/queries/clientpositive/convert_enum_to_string.q /hive/trunk/ql/src/test/results/clientpositive/convert_enum_to_string.q.out /hive/trunk/serde/if/test/megastruct.thrift /hive/trunk/serde/src/gen/thrift/gen-javabean/org/apache/hadoop/hive/serde2/thrift/test/MegaStruct.java /hive/trunk/serde/src/gen/thrift/gen-javabean/org/apache/hadoop/hive/serde2/thrift/test/MiniStruct.java /hive/trunk/serde/src/gen/thrift/gen-javabean/org/apache/hadoop/hive/serde2/thrift/test/MyEnum.java /hive/trunk/serde/src/java/org/apache/hadoop/hive/serde2/objectinspector/ObjectInspectorFactory.java /hive/trunk/serde/src/java/org/apache/hadoop/hive/serde2/objectinspector/primitive/JavaStringObjectInspector.java
          Hide
          Ashutosh Chauhan added a comment -

          Committed to trunk. Thanks, Travis!

          Show
          Ashutosh Chauhan added a comment - Committed to trunk. Thanks, Travis!
          Hide
          Ashutosh Chauhan added a comment -

          Thanks, Travis for updating the patch. I will run test and commit this soon.

          Show
          Ashutosh Chauhan added a comment - Thanks, Travis for updating the patch. I will run test and commit this soon.
          Hide
          Ashutosh Chauhan added a comment -

          +1

          Show
          Ashutosh Chauhan added a comment - +1
          Hide
          Travis Crawford added a comment -

          Thanks for the feedback Ashutosh – I agree making this conversion optional introduces complexity, and will remove the option + update the patch.

          Show
          Travis Crawford added a comment - Thanks for the feedback Ashutosh – I agree making this conversion optional introduces complexity, and will remove the option + update the patch.
          Hide
          Ashutosh Chauhan added a comment -

          MultiKey stuff you added in OIcache doesn't look particularly clean. I think motivation for making that change was if there are multiple ThriftDeserializer in a job and for some reason they have different value for this new config key, you wanted to return different OIs with correct config. Intention is benign, but I am not sure if this really will work though. First of all its a rare condition that you have more then one serde in a job, only place I can think of is when you are reading two different tables in a same job (probably for joining) this will come up. There since configuration object is same for whole job, you are going to get only one value for whole job for your config key, not for each serde. So, even if some comes up with this rare use case of having two different ThriftSerDe which differs on this property, I doubt this will work. Further, this introduces dependecy on commons-collections which implies we need to package and ship this jar to backend. Thirdly, it introduces code complexity. So, I think we should drop this Multi-Key stuff for OIcache and keep it in original form. You might be able to make this work if you keep this config key in Table properties instead of Configuration, but I think we are introducing too much complexity for a rare case. At this point I will retract from opening up this option of having struct<int>. Lets always convert to string going forward, keeping the code changes to minimal and avoiding complexity.

          Another feedback is I think for JavaStringObjectInspector. I think instead of testing for type of object, you can always return o.toString() there, since if it was Enum, thats what you are doing and if it were String, then toString() on String returns this so we will get desired result for both cases and avoid type-check test.

          Show
          Ashutosh Chauhan added a comment - MultiKey stuff you added in OIcache doesn't look particularly clean. I think motivation for making that change was if there are multiple ThriftDeserializer in a job and for some reason they have different value for this new config key, you wanted to return different OIs with correct config. Intention is benign, but I am not sure if this really will work though. First of all its a rare condition that you have more then one serde in a job, only place I can think of is when you are reading two different tables in a same job (probably for joining) this will come up. There since configuration object is same for whole job, you are going to get only one value for whole job for your config key, not for each serde. So, even if some comes up with this rare use case of having two different ThriftSerDe which differs on this property, I doubt this will work. Further, this introduces dependecy on commons-collections which implies we need to package and ship this jar to backend. Thirdly, it introduces code complexity. So, I think we should drop this Multi-Key stuff for OIcache and keep it in original form. You might be able to make this work if you keep this config key in Table properties instead of Configuration, but I think we are introducing too much complexity for a rare case. At this point I will retract from opening up this option of having struct<int>. Lets always convert to string going forward, keeping the code changes to minimal and avoiding complexity. Another feedback is I think for JavaStringObjectInspector. I think instead of testing for type of object, you can always return o.toString() there, since if it was Enum, thats what you are doing and if it were String, then toString() on String returns this so we will get desired result for both cases and avoid type-check test.
          Hide
          Travis Crawford added a comment -
          Show
          Travis Crawford added a comment - Reviewboard: https://reviews.apache.org/r/6915/
          Hide
          Travis Crawford added a comment -

          While making a number of thrift-related change, I've found testing has been a challenge because we didn't have a really comprehensive thrift struct with all the field types we wanted to test. Part of this change is a new struct that actually has an enum field, but also stuff like a binary field (which we now see has basically the same issue as enum, where the HeapByteBuffer is treated like a struct rather than byte[]), set fields (which I didn't realize until recently are not supported, etc. Now we'll have visibility into this stuff and can fix.

          struct MegaStruct {
             1: optional bool my_bool,
             2: optional byte my_byte,
             3: optional i16 my_16bit_int,
             4: optional i32 my_32bit_int,
             5: optional i64 my_64bit_int,
             6: optional double my_double,
             7: optional string my_string,
             8: optional binary my_binary,
             9: optional map<string, string> my_string_string_map,
            10: optional map<string, MyEnum> my_string_enum_map,
            11: optional map<MyEnum, string> my_enum_string_map,
            12: optional map<MyEnum, MiniStruct> my_enum_struct_map,
            13: optional map<MyEnum, list<string>> my_enum_stringlist_map,
            14: optional map<MyEnum, list<MiniStruct>> my_enum_structlist_map,
            15: optional list<string> my_stringlist,
            16: optional list<MiniStruct> my_structlist,
            17: optional list<MyEnum> my_enumlist,
            18: optional set<string> my_stringset,
            19: optional set<MyEnum> my_enumset,
            20: optional set<MiniStruct> my_structset
          }
          
          Show
          Travis Crawford added a comment - While making a number of thrift-related change, I've found testing has been a challenge because we didn't have a really comprehensive thrift struct with all the field types we wanted to test. Part of this change is a new struct that actually has an enum field, but also stuff like a binary field (which we now see has basically the same issue as enum, where the HeapByteBuffer is treated like a struct rather than byte[]), set fields (which I didn't realize until recently are not supported, etc. Now we'll have visibility into this stuff and can fix. struct MegaStruct { 1: optional bool my_bool, 2: optional byte my_byte, 3: optional i16 my_16bit_int, 4: optional i32 my_32bit_int, 5: optional i64 my_64bit_int, 6: optional double my_double, 7: optional string my_string, 8: optional binary my_binary, 9: optional map<string, string> my_string_string_map, 10: optional map<string, MyEnum> my_string_enum_map, 11: optional map<MyEnum, string> my_enum_string_map, 12: optional map<MyEnum, MiniStruct> my_enum_struct_map, 13: optional map<MyEnum, list<string>> my_enum_stringlist_map, 14: optional map<MyEnum, list<MiniStruct>> my_enum_structlist_map, 15: optional list<string> my_stringlist, 16: optional list<MiniStruct> my_structlist, 17: optional list<MyEnum> my_enumlist, 18: optional set<string> my_stringset, 19: optional set<MyEnum> my_enumset, 20: optional set<MiniStruct> my_structset }
          Hide
          Jakob Homan added a comment -

          MegaStruct

          eh?

          Show
          Jakob Homan added a comment - MegaStruct eh?
          Hide
          Travis Crawford added a comment -

          Sounds good Ashutosh Chauhan! I'm rebasing this patch against trunk and making enum-to-string the default, and leaving a property in place to disable the conversion. I'll open a reviewboard for the updated patch.

          I'm excited to get MegaStruct in, so we can add test cases for some bizarre but valid thrift structs we've come across

          Show
          Travis Crawford added a comment - Sounds good Ashutosh Chauhan ! I'm rebasing this patch against trunk and making enum-to-string the default, and leaving a property in place to disable the conversion. I'll open a reviewboard for the updated patch. I'm excited to get MegaStruct in, so we can add test cases for some bizarre but valid thrift structs we've come across
          Hide
          Ashutosh Chauhan added a comment -

          I agree that enums as string (as oppose to struct<int>) makes more sense in all cases. So we can change the default behavior. But since functionality existed in previous releases and folks might be using it, we should add a config param in case folks already using thrift serde can fall back to old behavior in case they want to. I dont think we need to add that option for Avro though since there ever was only was one way of doing it for Avro data.

          Show
          Ashutosh Chauhan added a comment - I agree that enums as string (as oppose to struct<int>) makes more sense in all cases. So we can change the default behavior. But since functionality existed in previous releases and folks might be using it, we should add a config param in case folks already using thrift serde can fall back to old behavior in case they want to. I dont think we need to add that option for Avro though since there ever was only was one way of doing it for Avro data.
          Hide
          Ashutosh Chauhan added a comment -

          Travis,
          Can you create a review request on reviews.apache.org ?

          Show
          Ashutosh Chauhan added a comment - Travis, Can you create a review request on reviews.apache.org ?
          Hide
          Travis Crawford added a comment -

          Hey all, good discussion about this enum issue. After thinking about the comments, I think it would be appropriate to always convert enum values to strings, as the Avro serde does.

          Here's why:

          • Native Hive queries: enums-as-strings make the most sense
          • Pig queries through HCat: enums-as-strings make the most sense
          • MR jobs through HCat: enums-as-strings are reasonable, and if users actually want the int value they can convert in their code.

          Since enums-as-strings makes sense for scripting languages, and users do not loose functionality in MR jobs, I think it makes sense to always convert enums to strings.

          Thoughts? If this sounds good I can update the patch.

          Show
          Travis Crawford added a comment - Hey all, good discussion about this enum issue. After thinking about the comments, I think it would be appropriate to always convert enum values to strings, as the Avro serde does. Here's why: Native Hive queries: enums-as-strings make the most sense Pig queries through HCat: enums-as-strings make the most sense MR jobs through HCat: enums-as-strings are reasonable, and if users actually want the int value they can convert in their code. Since enums-as-strings makes sense for scripting languages, and users do not loose functionality in MR jobs, I think it makes sense to always convert enums to strings. Thoughts? If this sounds good I can update the patch.
          Hide
          Jakob Homan added a comment -

          you're welcome to if you'd like. I won't have a chance to do it anytime soon.

          Show
          Jakob Homan added a comment - you're welcome to if you'd like. I won't have a chance to do it anytime soon.
          Hide
          Feng Peng added a comment -

          Hi Jakob, should we have the avro change in this same patch or in a separate patch?

          Thanks!
          Feng

          Show
          Feng Peng added a comment - Hi Jakob, should we have the avro change in this same patch or in a separate patch? Thanks! Feng
          Hide
          Feng Peng added a comment -

          I think it is fine. The current behavior for thrift is false and that's why we set it to false in the current patch. But I don't think people are using SerDe to read complex thrift data, otherwise they would have complained about this (and other problems we are having right now) already.

          Show
          Feng Peng added a comment - I think it is fine. The current behavior for thrift is false and that's why we set it to false in the current patch. But I don't think people are using SerDe to read complex thrift data, otherwise they would have complained about this (and other problems we are having right now) already.
          Hide
          Dmitriy V. Ryaboy added a comment -

          I am ok with making the default true. Don't think that would break anything. Feng?

          Show
          Dmitriy V. Ryaboy added a comment - I am ok with making the default true. Don't think that would break anything. Feng?
          Hide
          Jakob Homan added a comment -

          OK, but that sounds like a less frequently useful use case than converting to strings. Should we make the default behavior convert-to-string and add convert-to-struct as an option for thrift and avro?

          Show
          Jakob Homan added a comment - OK, but that sounds like a less frequently useful use case than converting to strings. Should we make the default behavior convert-to-string and add convert-to-struct as an option for thrift and avro?
          Hide
          Dmitriy V. Ryaboy added a comment -

          We are thinking of a situation where you are going through HCat but consuming in something that speaks structs, your own service.

          Show
          Dmitriy V. Ryaboy added a comment - We are thinking of a situation where you are going through HCat but consuming in something that speaks structs, your own service.
          Hide
          Jakob Homan added a comment -

          Is the current Thrift behavior worth keeping around? Maybe just convert it to just do the string conversion? I can't come up with a use case where I would want the struct Thrift provides.

          Show
          Jakob Homan added a comment - Is the current Thrift behavior worth keeping around? Maybe just convert it to just do the string conversion? I can't come up with a use case where I would want the struct Thrift provides.
          Hide
          Dmitriy V. Ryaboy added a comment -

          How about if we (as in, me/feng/travis) make a patch for AvroSerde to respect this property?

          Show
          Dmitriy V. Ryaboy added a comment - How about if we (as in, me/feng/travis) make a patch for AvroSerde to respect this property?
          Hide
          Jakob Homan added a comment -

          Right, but AvroSerde already does this conversion is what I'm saying. There's never been an option not to do the conversion.

          Show
          Jakob Homan added a comment - Right, but AvroSerde already does this conversion is what I'm saying. There's never been an option not to do the conversion.
          Hide
          Dmitriy V. Ryaboy added a comment -

          Sorry, let me try a less snide reply
          It doesn't make sense to me to have a situation where you are working with Thrift and Avro, and you want to not convert from Thrift but you do want to convert from Avro. What you really want is establish how Hive does serde type conversions, not how it does them for individual encodings of data.

          Show
          Dmitriy V. Ryaboy added a comment - Sorry, let me try a less snide reply It doesn't make sense to me to have a situation where you are working with Thrift and Avro, and you want to not convert from Thrift but you do want to convert from Avro. What you really want is establish how Hive does serde type conversions, not how it does them for individual encodings of data.
          Hide
          Dmitriy V. Ryaboy added a comment -

          Jakob, that sounds like something to fix in AvroSerde.

          Show
          Dmitriy V. Ryaboy added a comment - Jakob, that sounds like something to fix in AvroSerde.
          Hide
          Jakob Homan added a comment -

          One comment I have:

          +    CONVERT_ENUM_TO_STRING("hive.data.convert.enum.to.string", false),
          

          since AvroSerde already does this and doesn't provide an option not to, can we change option name to be thrift specific?

          Show
          Jakob Homan added a comment - One comment I have: + CONVERT_ENUM_TO_STRING("hive.data.convert.enum.to.string", false), since AvroSerde already does this and doesn't provide an option not to, can we change option name to be thrift specific?
          Hide
          Feng Peng added a comment -

          Bump.

          Is it possible some committer to take a look at this patch? A few of our later patches depend on it and it'd be great if we can get some feedback. Thanks!

          Show
          Feng Peng added a comment - Bump. Is it possible some committer to take a look at this patch? A few of our later patches depend on it and it'd be great if we can get some feedback. Thanks!
          Hide
          Travis Crawford added a comment -

          Hey all, any chance I could get some feedback on this patch? Today we found another thrift-related issue where a struct with no fields is not handled correctly (yes a bizarre field choice but valid thrift). Fixing would depend on updating megastruct and its test (added here), and I'd like to avoid piling more changes into this patch.

          Show
          Travis Crawford added a comment - Hey all, any chance I could get some feedback on this patch? Today we found another thrift-related issue where a struct with no fields is not handled correctly (yes a bizarre field choice but valid thrift). Fixing would depend on updating megastruct and its test (added here), and I'd like to avoid piling more changes into this patch.
          Hide
          Travis Crawford added a comment -

          Ok, I believe this patch is complete.

          Its interesting to see the megastruct serde-reported schema in the test output. For example, binary is reported as:

          my_binary struct<hb:binary,offset:int,isreadonly:boolean,bigendian:boolean,nativebyteorder:boolean> from deserializer

          This is because its a HeapByteBuffer; we have a followup patch that fixes this that can be submitted after this issue is resolved.

          Also note how sets are structs with no fields.

          my_stringset struct<> from deserializer

          I'm not sure how to resolve this yet (turn them into lists?) but is an issue that we now have visibility into.

          Show
          Travis Crawford added a comment - Ok, I believe this patch is complete. Its interesting to see the megastruct serde-reported schema in the test output. For example, binary is reported as: my_binary struct<hb:binary,offset:int,isreadonly:boolean,bigendian:boolean,nativebyteorder:boolean> from deserializer This is because its a HeapByteBuffer; we have a followup patch that fixes this that can be submitted after this issue is resolved. Also note how sets are structs with no fields. my_stringset struct<> from deserializer I'm not sure how to resolve this yet (turn them into lists?) but is an issue that we now have visibility into.
          Hide
          Travis Crawford added a comment -

          Hey all -

          Feng and I looked at this today and decided we missed the nested map value case because we didn't have a good way to test. We have an update (running CI now, will post patch after it passes) that adds a new test struct that contains all thrift data types, as well as some gnarly cases we found in real data (enum map keys, enum map values, etc).

          https://github.com/traviscrawford/hive/blob/HIVE-3323_enum_to_string/serde/if/test/megastruct.thrift

          This new struct will allow us to test this and other thrift cases, such as how binary data is not handled correctly (its interpreted as a ByteBuffer struct instead of byte[]).

          Will post what we believe is the last version of this patch after tests finish.

          Show
          Travis Crawford added a comment - Hey all - Feng and I looked at this today and decided we missed the nested map value case because we didn't have a good way to test. We have an update (running CI now, will post patch after it passes) that adds a new test struct that contains all thrift data types, as well as some gnarly cases we found in real data (enum map keys, enum map values, etc). https://github.com/traviscrawford/hive/blob/HIVE-3323_enum_to_string/serde/if/test/megastruct.thrift This new struct will allow us to test this and other thrift cases, such as how binary data is not handled correctly (its interpreted as a ByteBuffer struct instead of byte[]). Will post what we believe is the last version of this patch after tests finish.
          Hide
          Feng Peng added a comment -

          The previous patch didn't cover nested structures in map values. Posted a new patch using a static property in ObjectInspectorFactory instead of passing the property around. A better solution would be making the ObjectInspectorFactory a singleton instead of the current stateless static class.

          Show
          Feng Peng added a comment - The previous patch didn't cover nested structures in map values. Posted a new patch using a static property in ObjectInspectorFactory instead of passing the property around. A better solution would be making the ObjectInspectorFactory a singleton instead of the current stateless static class.
          Hide
          Jakob Homan added a comment -

          Just as another data point, the AvroSerde already converts Avro enums to strings, so this will make the behavior more consistent.

          Show
          Jakob Homan added a comment - Just as another data point, the AvroSerde already converts Avro enums to strings, so this will make the behavior more consistent.
          Hide
          Travis Crawford added a comment -

          I'm not able to post the review with arc, getting the following error:

          TW-MBP13-TCrawford:hive travis$ arc diff --jira HIVE-3323 
          
          Exception:
          Parse Exception: Expected a hunk header, like 'Index: /path/to/file.ext' (svn), 'Property changes on: /path/to/file.ext' (svn properties), 'commit 59bcc3ad6775562f845953cf01624225' (git show), 'diff --git' (git diff), or '--- filename' (unified diff).
          
             >>>        1   
          
          
          (Run with --trace for a full exception trace.)
          TW-MBP13-TCrawford:hive travis$ 
          
          Show
          Travis Crawford added a comment - I'm not able to post the review with arc, getting the following error: TW-MBP13-TCrawford:hive travis$ arc diff --jira HIVE-3323 Exception: Parse Exception: Expected a hunk header, like 'Index: /path/to/file.ext' (svn), 'Property changes on: /path/to/file.ext' (svn properties), 'commit 59bcc3ad6775562f845953cf01624225' (git show), 'diff --git' (git diff), or '--- filename' (unified diff). >>> 1 (Run with --trace for a full exception trace.) TW-MBP13-TCrawford:hive travis$
          Hide
          Travis Crawford added a comment -

          Adding a field to the "complex" thrift struct caused some test failures. This patch updates tests that failed due to the new field.

          Show
          Travis Crawford added a comment - Adding a field to the "complex" thrift struct caused some test failures. This patch updates tests that failed due to the new field.
          Hide
          Travis Crawford added a comment -

          Summary of what this feature enables:

          hive> create table convert_enum_to_string
              >   partitioned by (b string)
              >   row format serde "org.apache.hadoop.hive.serde2.thrift.ThriftDeserializer"
              >     with serdeproperties (
              >       "serialization.class"="org.apache.hadoop.hive.serde2.thrift.test.Complex",
              >       "serialization.format"="org.apache.thrift.protocol.TBinaryProtocol");
          OK
          Time taken: 0.067 seconds
          hive> set hive.data.convert.enum.to.string=true;
          hive> describe convert_enum_to_string;
          OK
          aint	int	from deserializer
          astring	string	from deserializer
          lint	array<int>	from deserializer
          lstring	array<string>	from deserializer
          lintstring	array<struct<myint:int,mystring:string,underscore_int:int>>	from deserializer
          mstringstring	map<string,string>	from deserializer
          myenum	string	from deserializer
          b	string	
          Time taken: 0.349 seconds
          hive> set hive.data.convert.enum.to.string=false;
          hive> describe convert_enum_to_string;
          OK
          aint	int	from deserializer
          astring	string	from deserializer
          lint	array<int>	from deserializer
          lstring	array<string>	from deserializer
          lintstring	array<struct<myint:int,mystring:string,underscore_int:int>>	from deserializer
          mstringstring	map<string,string>	from deserializer
          myenum	struct<value:int>	from deserializer
          b	string	
          Time taken: 0.144 seconds
          
          Show
          Travis Crawford added a comment - Summary of what this feature enables: hive> create table convert_enum_to_string > partitioned by (b string) > row format serde "org.apache.hadoop.hive.serde2.thrift.ThriftDeserializer" > with serdeproperties ( > "serialization.class" = "org.apache.hadoop.hive.serde2.thrift.test.Complex" , > "serialization.format" = "org.apache.thrift.protocol.TBinaryProtocol" ); OK Time taken: 0.067 seconds hive> set hive.data.convert. enum .to.string= true ; hive> describe convert_enum_to_string; OK aint int from deserializer astring string from deserializer lint array< int > from deserializer lstring array<string> from deserializer lintstring array<struct<myint: int ,mystring:string,underscore_int: int >> from deserializer mstringstring map<string,string> from deserializer myenum string from deserializer b string Time taken: 0.349 seconds hive> set hive.data.convert. enum .to.string= false ; hive> describe convert_enum_to_string; OK aint int from deserializer astring string from deserializer lint array< int > from deserializer lstring array<string> from deserializer lintstring array<struct<myint: int ,mystring:string,underscore_int: int >> from deserializer mstringstring map<string,string> from deserializer myenum struct<value: int > from deserializer b string Time taken: 0.144 seconds
          Hide
          Travis Crawford added a comment -

          Updated patch adds a config option to enable/disable this feature.

          Show
          Travis Crawford added a comment - Updated patch adds a config option to enable/disable this feature.
          Hide
          Travis Crawford added a comment -

          Attached is a work-in-progress patch that converts enums to strings. This implementation adds the conversion logic inside JavaStringObjectInspector because this allows us to totally behave as Strings everywhere.

          Something I'm not clear on is how to make this conversion optional. On the MR cluster, is there a static method for getting a hive conf stored in the job conf? That would be much preferable to passing the conf through from all places where ObjectInspectorFactory is called.

          Thoughts on this patch, or suggestions on a better way to add this conversion?

          Show
          Travis Crawford added a comment - Attached is a work-in-progress patch that converts enums to strings. This implementation adds the conversion logic inside JavaStringObjectInspector because this allows us to totally behave as Strings everywhere. Something I'm not clear on is how to make this conversion optional. On the MR cluster, is there a static method for getting a hive conf stored in the job conf? That would be much preferable to passing the conf through from all places where ObjectInspectorFactory is called. Thoughts on this patch, or suggestions on a better way to add this conversion?

            People

            • Assignee:
              Travis Crawford
              Reporter:
              Travis Crawford
            • Votes:
              0 Vote for this issue
              Watchers:
              8 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development