Details
-
Bug
-
Status: Resolved
-
Minor
-
Resolution: Fixed
-
Impala 2.5.0
Description
When dealing with text tables I tend to use ASCII delimited text if possible: https://en.wikipedia.org/wiki/Delimiter#ASCII_delimited_text
However, when you do this impala gives the wrong show create table output for delimiters that aren't printable ascii characters:
[hostname:21000] > create table u_alanj.test_ascii_delimited(a string, b string) row format delimited fields terminated by '\u0031' lines terminated by '\030'; Query: create table u_alanj.test_ascii_delimited(a string, b string) row format delimited fields terminated by '\u0031' lines terminated by '\030' Fetched 0 row(s) in 10.59s [hostname:21000] > show create table u_alanj.test_ascii_delimited; Query: show create table u_alanj.test_ascii_delimited +-------------------------------------------------------------------------------------------------------+ | result | +-------------------------------------------------------------------------------------------------------+ | CREATE TABLE u_alanj.test_ascii_delimited ( | | a STRING, | | b STRING | | ) | | ROW FORMAT DELIMITED FIELDS TERMINATED BY '\u001F' LINES TERMINATED BY '\u0018' | | WITH SERDEPROPERTIES ('line.delim'='\u0018', 'field.delim'='\u001F', 'serialization.format'='\u001F') | | STORED AS TEXTFILE | | LOCATION 'hdfs://nameservice1/user/hive/warehouse/u_alanj.db/test_ascii_delimited' | | TBLPROPERTIES ('transient_lastDdlTime'='1462204213') | +-------------------------------------------------------------------------------------------------------+ Fetched 1 row(s) in 4.94s
If you put the statement from show create table into a new create table statement with the same data, it will not work.
[hostname:21000] > create table u_alanj.test_ascii_delimited2(a string, b string) ROW FORMAT DELIMITED FIELDS TERMINATED BY '\u001F' LINES TERMINATED BY '\u0018'; Query: create table u_alanj.test_ascii_delimited2(a string, b string) ROW FORMAT DELIMITED FIELDS TERMINATED BY '\u001F' LINES TERMINATED BY '\u0018' Fetched 0 row(s) in 7.83s [hostname:21000] > show create table u_alanj.test_ascii_delimited2; Query: show create table u_alanj.test_ascii_delimited2 +-------------------------------------------------------------------------------------------------------+ | result | +-------------------------------------------------------------------------------------------------------+ | CREATE TABLE u_alanj.test_ascii_delimited2 ( | | a STRING, | | b STRING | | ) | | ROW FORMAT DELIMITED FIELDS TERMINATED BY '\u0019' LINES TERMINATED BY '\u0012' | | WITH SERDEPROPERTIES ('line.delim'='\u0012', 'field.delim'='\u0019', 'serialization.format'='\u0019') | | STORED AS TEXTFILE | | LOCATION 'hdfs://nameservice1/user/hive/warehouse/u_alanj.db/test_ascii_delimited2' | | TBLPROPERTIES ('transient_lastDdlTime'='1462204481') | +-------------------------------------------------------------------------------------------------------+ Fetched 1 row(s) in 3.17s
In case it matters, Hive also does this wrong.
PS: Jira doesn't have component marked as required, but wouldn't let me create this without a component.