Details
-
Bug
-
Status: Resolved
-
Major
-
Resolution: Duplicate
-
0.13.0, 0.13.1
-
None
Description
FIELDS TERMINATED BY parsing broken when delimiter is more than 1 byte. Delimiter starting from 2nd character becomes part of returned data. No parsed properly.
Test case:
CREATE external TABLE test_muldelim ( string1 STRING, string2 STRING, string3 STRING ) ROW FORMAT DELIMITED FIELDS TERMINATED BY '<>' LINES TERMINATED BY '\n' STORED AS TEXTFILE location '/user/hive/test_muldelim'
Create a text file under /user/hive/test_muldelim with following 2 lines:
data1<>data2<>data3 aa<>bb<>cc
Now notice that two-character delimiter wasn't parsed properly:
jdbc:hive2://host.domain.com:1> select * from ruslan_test.test_muldelim ; +------------------------+------------------------+------------------------+--+ | test_muldelim.string1 | test_muldelim.string2 | test_muldelim.string3 | +------------------------+------------------------+------------------------+--+ | data1 | >data2 | >data3 | | aa | >bb | >cc | +------------------------+------------------------+------------------------+--+ 2 rows selected (0.453 seconds)
The second delimiter's character ('>') became part of the columns to the right (`string2` and `string3`).
Table DDL:
0: jdbc:hive2://host.domain.com:1> show create table dafault.test_muldelim ; +-----------------------------------------------------------------+--+ | createtab_stmt | +-----------------------------------------------------------------+--+ | CREATE EXTERNAL TABLE `default.test_muldelim`( | | `string1` string, | | `string2` string, | | `string3` string) | | ROW FORMAT DELIMITED | | FIELDS TERMINATED BY '<>' | | LINES TERMINATED BY '\n' | | STORED AS INPUTFORMAT | | 'org.apache.hadoop.mapred.TextInputFormat' | | OUTPUTFORMAT | | 'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat' | | LOCATION | | 'hdfs://epsdatalake/user/hive/test_muldelim' | | TBLPROPERTIES ( | | 'transient_lastDdlTime'='1476727100') | +-----------------------------------------------------------------+--+ 15 rows selected (0.286 seconds)