Details
-
Bug
-
Status: Open
-
Critical
-
Resolution: Unresolved
-
None
-
None
-
None
-
None
Description
LazySimpleSerDe, the default SerDe for Hive is severely broken:
- Empty arrays are serialized as an empty string. Hence an array(array()) is indistinguishable from array(array(array())) from array().
- Similarly, empty strings are serialized as an empty string. Hence array('') is also indistinguishable from an empty array.
- if the serialized string equals the null sequence, then it is ambiguous as to whether it is an array with a single null element or a null array.
It also does not do well with control characters:
> select array('foo\002bar') from tmp;
...
["foo","bar"]
> select array('foo\001bar') from tmp;
...
["foo"]
Attachments
Issue Links
- is related to
-
HIVE-2303 files with control-A,B are not delimited correctly.
- Closed