Details
-
Bug
-
Status: Closed
-
Major
-
Resolution: Fixed
-
1.2.0
-
None
Description
If a LazySimpleSerDe table is created using ISO 8859-1 encoding, query results for a string column are incorrect for any row that was preceded by a row containing a longer string.
Example steps to reproduce:
1. Create a table using ISO 8859-1 encoding:
CREATE TABLE person_lat1 (name STRING) ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe' WITH SERDEPROPERTIES ('serialization.encoding'='ISO8859_1');
2. Copy an ISO-8859-1 encoded text file into the appropriate warehouse folder in HDFS. I'll attach an example file containing the following text:
Müller,Thomas Jørgensen,Jørgen Peña,Andrés Nåm,Fæk
3. Execute SELECT * FROM person_lat1
Result - The following output appears:
+-------------------+--+ | person_lat1.name | +-------------------+--+ | Müller,Thomas | | Jørgensen,Jørgen | | Peña,Andrésørgen | | Nåm,Fækdrésørgen | +-------------------+--+
Attachments
Attachments
Issue Links
- is duplicated by
-
HIVE-10983 SerDeUtils bug ,when Text is reused
- Resolved
- is part of
-
HIVE-10983 SerDeUtils bug ,when Text is reused
- Resolved
- relates to
-
HIVE-11095 SerDeUtils another bug ,when Text is reused
- Closed