[HIVE-11112] ISO-8859-1 text output has fragments of previous longer rows appended - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Closed
Priority: Major
Resolution: Fixed
Affects Version/s: 1.2.0
Fix Version/s: 1.3.0, 2.0.0
Component/s: Serializers/Deserializers
Labels:
None

Description

If a LazySimpleSerDe table is created using ISO 8859-1 encoding, query results for a string column are incorrect for any row that was preceded by a row containing a longer string.

Example steps to reproduce:

1. Create a table using ISO 8859-1 encoding:

CREATE TABLE person_lat1 (name STRING)
ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe' WITH SERDEPROPERTIES ('serialization.encoding'='ISO8859_1');

2. Copy an ISO-8859-1 encoded text file into the appropriate warehouse folder in HDFS. I'll attach an example file containing the following text:

Müller,Thomas
Jørgensen,Jørgen
Peña,Andrés
Nåm,Fæk

3. Execute SELECT * FROM person_lat1

Result - The following output appears:

+-------------------+--+
| person_lat1.name |
+-------------------+--+
| Müller,Thomas |
| Jørgensen,Jørgen |
| Peña,Andrésørgen |
| Nåm,Fækdrésørgen |
+-------------------+--+

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending

HIVE-11112.1.patch
25/Jun/15 18:52
3 kB
Yongzhi Chen

Issue Links

is duplicated by

HIVE-10983 SerDeUtils bug ,when Text is reused

Resolved

is part of

HIVE-10983 SerDeUtils bug ,when Text is reused

Resolved

relates to

HIVE-11095 SerDeUtils another bug ,when Text is reused

Closed

Activity

People

Assignee:: Yongzhi Chen

Reporter:: Yongzhi Chen

Votes:: 0 Vote for this issue

Watchers:: 5 Start watching this issue

Dates

Created:: 25/Jun/15 17:18

Updated:: 16/Feb/16 23:51

Resolved:: 29/Jun/15 14:23