Uploaded image for project: 'Hive'
  1. Hive
  2. HIVE-11129

Issue a warning when copied from UTF-8 to ISO 8859-1

    Details

    • Type: Bug
    • Status: Closed
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 1.3.0, 2.0.0
    • Component/s: File Formats
    • Labels:
      None

      Description

      Copying data from a table using UTF-8 encoding to one using ISO 8859-1 encoding causes data corruption without warning.

      CREATE TABLE person_utf8 (name STRING)
      ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe' 
      WITH SERDEPROPERTIES ('serialization.encoding'='UTF8');
      

      Put the following data in the table:
      Müller,Thomas
      Jørgensen,Jørgen
      Vega,Andrés
      中村,浩人
      אביה,נועם

      CREATE TABLE person_2 ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe' 
      WITH SERDEPROPERTIES ('serialization.encoding'='ISO8859_1')
      AS select * from person_utf8;
      

      expected to get mangled data but we should give a warning.

        Attachments

          Activity

            People

            • Assignee:
              aihuaxu Aihua Xu
              Reporter:
              aihuaxu Aihua Xu
            • Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: