Uploaded image for project: 'Hive'
  1. Hive
  2. HIVE-7142

Hive multi serialization encoding support

    Details

    • Type: Improvement
    • Status: Closed
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 0.14.0
    • Labels:
      None

      Description

      Currently Hive only support serialize data into UTF-8 charset bytes or deserialize from UTF-8 bytes, real world users may want to load different kinds of encoded data into hive directly. This jira is dedicated to support serialize/deserialize all kinds of encoded data in SerDe layer.

      For user, only need to configure serialization encoding on table level by set serialization encoding through serde parameter, for example:

      CREATE TABLE person(id INT, name STRING, desc STRING)ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe' WITH SERDEPROPERTIES("serialization.encoding"='GBK');
      

      or

      ALTER TABLE person SET SERDEPROPERTIES ('serialization.encoding'='GBK'); 
      

      LIMITATIONS: Only LazySimpleSerDe support "serialization.encoding" property in this patch.

        Attachments

        1. HIVE-7142.1.patch.txt
          8 kB
          Chengxiang Li
        2. HIVE-7142.2.patch
          8 kB
          Chengxiang Li
        3. HIVE-7142.3.patch
          8 kB
          Chengxiang Li
        4. HIVE-7142.4.patch
          12 kB
          Chengxiang Li

          Issue Links

            Activity

              People

              • Assignee:
                chengxiang li Chengxiang Li
                Reporter:
                chengxiang li Chengxiang Li
              • Votes:
                0 Vote for this issue
                Watchers:
                10 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: