Uploaded image for project: 'Hive'
  1. Hive
  2. HIVE-2333

LazySimpleSerDe does not properly handle arrays / escape control characters

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Open
    • Critical
    • Resolution: Unresolved
    • None
    • None
    • None
    • None

    Description

      LazySimpleSerDe, the default SerDe for Hive is severely broken:

      • Empty arrays are serialized as an empty string. Hence an array(array()) is indistinguishable from array(array(array())) from array().
      • Similarly, empty strings are serialized as an empty string. Hence array('') is also indistinguishable from an empty array.
      • if the serialized string equals the null sequence, then it is ambiguous as to whether it is an array with a single null element or a null array.

      It also does not do well with control characters:

      > select array('foo\002bar') from tmp;
      ...
      ["foo","bar"]

      > select array('foo\001bar') from tmp;
      ...
      ["foo"]

      Attachments

        Issue Links

          Activity

            People

              Unassigned Unassigned
              jonchang Jonathan Chang
              Votes:
              1 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

                Created:
                Updated: