Uploaded image for project: 'Hive'
  1. Hive
  2. HIVE-7142

Hive multi serialization encoding support

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Closed
    • Major
    • Resolution: Fixed
    • None
    • 0.14.0
    • None

    Description

      Currently Hive only support serialize data into UTF-8 charset bytes or deserialize from UTF-8 bytes, real world users may want to load different kinds of encoded data into hive directly. This jira is dedicated to support serialize/deserialize all kinds of encoded data in SerDe layer.

      For user, only need to configure serialization encoding on table level by set serialization encoding through serde parameter, for example:

      CREATE TABLE person(id INT, name STRING, desc STRING)ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe' WITH SERDEPROPERTIES("serialization.encoding"='GBK');
      

      or

      ALTER TABLE person SET SERDEPROPERTIES ('serialization.encoding'='GBK'); 
      

      LIMITATIONS: Only LazySimpleSerDe support "serialization.encoding" property in this patch.

      Attachments

        1. HIVE-7142.4.patch
          12 kB
          Chengxiang Li
        2. HIVE-7142.3.patch
          8 kB
          Chengxiang Li
        3. HIVE-7142.2.patch
          8 kB
          Chengxiang Li
        4. HIVE-7142.1.patch.txt
          8 kB
          Chengxiang Li

        Issue Links

          Activity

            People

              chengxiang li Chengxiang Li
              chengxiang li Chengxiang Li
              Votes:
              0 Vote for this issue
              Watchers:
              9 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: