Details
-
Bug
-
Status: Closed
-
Major
-
Resolution: Fixed
-
None
-
None
Description
Copying data from a table using UTF-8 encoding to one using ISO 8859-1 encoding causes data corruption without warning.
CREATE TABLE person_utf8 (name STRING) ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe' WITH SERDEPROPERTIES ('serialization.encoding'='UTF8');
Put the following data in the table:
Müller,Thomas
Jørgensen,Jørgen
Vega,Andrés
中村,浩人
אביה,נועם
CREATE TABLE person_2 ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe' WITH SERDEPROPERTIES ('serialization.encoding'='ISO8859_1') AS select * from person_utf8;
expected to get mangled data but we should give a warning.