Uploaded image for project: 'Hive'
  1. Hive
  2. HIVE-12653

The property "serialization.encoding" in the class "org.apache.hadoop.hive.contrib.serde2.MultiDelimitSerDe" does not work

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Closed
    • Major
    • Resolution: Fixed
    • 1.2.1
    • 2.1.0
    • Contrib
    • None
    • add 'serialization.encoding' and suport GBK charset for the class 'org.apache.hadoop.hive.contrib.serde2.MultiDelimitSerDe' ,please test it.

    Description

      when I create table with ROW FORMAT SERDE 'org.apache.hadoop.hive.contrib.serde2.MultiDelimitSerDe' and load some files with chinese encoded by GBK:
      create table PersonInfo (cod_fn_ent string, num_seq_trc_form string, date_tr string,
      num_jrn_no string, cod_trc_form_typ string,id_intl_ip string, name string )
      ROW FORMAT SERDE 'org.apache.hadoop.hive.contrib.serde2.MultiDelimitSerDe'
      WITH SERDEPROPERTIES ("field.delim"="|!","serialization.encoding"='GBK');

      load data local inpath '/home/mr/hive/99-BoEing-IF_PMT_NOTE-2G-20151019-00000' overwrite into table PersonInfo;

      I found chinese disorder code in the table and 'serialization.encoding' does not work, the chinese disorder data list as below:

      ���� 99999999�ϴ����������� 0624624002��ʱ����������

      Attachments

        1. HIVE-12653.3.patch
          4 kB
          yangfang
        2. HIVE-12653.2.patch
          3 kB
          yangfang
        3. HIVE-12653.patch
          3 kB
          yangfang
        4. HIVE-12653.patch
          3 kB
          yangfang

        Activity

          People

            yangfang yangfang
            yangfang yangfang
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: