Hive
  1. Hive
  2. HIVE-2917

Add support for various charsets in LazySimpleSerDe

    Details

    • Type: New Feature New Feature
    • Status: Patch Available
    • Priority: Major Major
    • Resolution: Unresolved
    • Affects Version/s: 0.9.0
    • Fix Version/s: None
    • Labels:
      None

      Description

      Currently hive can only serialize/deserialize data encoded in utf-8.

      It would be useful to specify the data's charset when creating the table.

      The idea is to add a new keyword CHARSET to set charset at table level.
      For example:
      CREATE TABLE tbl1 (col1 STRING) ROW FORMAT CHARET "GBK" DELIMITED FIELDS TERMINATED BY '\t';

      Another place to use CHARSET is in TRANSFORM clause.
      For example:
      SELECT TRANSFORM(col1, col2) ROW FORMAT CHARSET 'gbk'
      USING 'some_script'
      AS (col3, col4) ROW FORMAT CHARSET 'utf-8';

      1. ASF.LICENSE.NOT.GRANTED--HIVE-2917.D2619.1.patch
        73 kB
        Phabricator
      2. HIVE-2917.1.patch.txt
        72 kB
        Kai Zhang
      3. HIVE-2917.2.patch.txt
        72 kB
        Kai Zhang
      4. HIVE-2917.3.patch.txt
        72 kB
        Kai Zhang

        Activity

          People

          • Assignee:
            Unassigned
            Reporter:
            Kai Zhang
          • Votes:
            1 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

            • Created:
              Updated:

              Development