Details
-
Bug
-
Status: Resolved
-
Major
-
Resolution: Fixed
-
Impala 1.1
-
None
-
None
Description
For a table created with an ESCAPED BY clause:
create table pipes (s string) row format delimited fields terminated by '|' escaped by '#';
one would expect that inserted string values would get the escape character prepended before each terminator character in the data file.
But this does not happen, leading to the columns getting mixed up when the table is queried:
insert into pipes values ('string contains a | character', 'string contains a #| sequence');
I expect the data file to look like:
string contains a #| character|string contains a ##| sequence
Instead, hdfs dfs -cat shows the pipes are unescaped:
string contains a | character|string contains a #| sequence
That produces wrong results when the same row is queried:
select * from pipes;
Query: select * from pipes
Query finished, fetching results ...
-------------------------------------------+
s1 | s2 |
-------------------------------------------+
string contains a | character |
-------------------------------------------+
It's interpreting the 2 parts of the first string as pipe-separated columns, and ignoring the second string.
Assigning to Alex, even if the cause is out of his area of responsibility, in case it is something specific to INSERT VALUES.
BTW: the ESCAPED BY clause was missing from the CREATE TABLE doc in Impala 1.0, but it is there now in the 1.1 doc. Although I'm waiting to straighten out the behavior before adding doc examples.