Uploaded image for project: 'Hive'
  1. Hive
  2. HIVE-11129

Issue a warning when copied from UTF-8 to ISO 8859-1

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Closed
    • Major
    • Resolution: Fixed
    • None
    • 1.3.0, 2.0.0
    • File Formats
    • None

    Description

      Copying data from a table using UTF-8 encoding to one using ISO 8859-1 encoding causes data corruption without warning.

      CREATE TABLE person_utf8 (name STRING)
      ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe' 
      WITH SERDEPROPERTIES ('serialization.encoding'='UTF8');
      

      Put the following data in the table:
      Müller,Thomas
      Jørgensen,Jørgen
      Vega,Andrés
      中村,浩人
      אביה,נועם

      CREATE TABLE person_2 ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe' 
      WITH SERDEPROPERTIES ('serialization.encoding'='ISO8859_1')
      AS select * from person_utf8;
      

      expected to get mangled data but we should give a warning.

      Attachments

        1. HIVE-11129.patch
          1 kB
          Aihua Xu

        Activity

          People

            aihuaxu Aihua Xu
            aihuaxu Aihua Xu
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: