Uploaded image for project: 'Hive'
  1. Hive
  2. HIVE-2917

Add support for various charsets in LazySimpleSerDe

    XMLWordPrintableJSON

Details

    • New Feature
    • Status: Patch Available
    • Major
    • Resolution: Unresolved
    • 0.9.0
    • None
    • None

    Description

      Currently hive can only serialize/deserialize data encoded in utf-8.

      It would be useful to specify the data's charset when creating the table.

      The idea is to add a new keyword CHARSET to set charset at table level.
      For example:
      CREATE TABLE tbl1 (col1 STRING) ROW FORMAT CHARET "GBK" DELIMITED FIELDS TERMINATED BY '\t';

      Another place to use CHARSET is in TRANSFORM clause.
      For example:
      SELECT TRANSFORM(col1, col2) ROW FORMAT CHARSET 'gbk'
      USING 'some_script'
      AS (col3, col4) ROW FORMAT CHARSET 'utf-8';

      Attachments

        1. ASF.LICENSE.NOT.GRANTED--HIVE-2917.D2619.1.patch
          73 kB
          Phabricator
        2. HIVE-2917.1.patch.txt
          72 kB
          Kai Zhang
        3. HIVE-2917.2.patch.txt
          72 kB
          Kai Zhang
        4. HIVE-2917.3.patch.txt
          72 kB
          Kai Zhang

        Activity

          People

            Unassigned Unassigned
            flyinggarden Kai Zhang
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

              Created:
              Updated: