Details
-
Bug
-
Status: Resolved
-
Major
-
Resolution: Fixed
-
Impala 2.2
-
None
Description
I exported a table as text using create table as select, and found that I could not find records that contained text with an underscore in it.
To reproduce.
Create file underscore.txt with this content:
First record has_underscore in the first line but not in the second line;Second record has_underscore in the second line;And third record has_underscore in the third line
Put it in HDFS
hadoop fs -mkdir /user/alanj/Underscore hadoop fs -put underscore.txt /user/alanj/Underscore/
Create a table and query it:
> create external table underscore(val string) row format delimited lines terminated by ';' location '/user/alanj/Underscore'; > select count(*) from underscore; Query: select count(*) from underscore +----------+ | count(*) | +----------+ | 3 | +----------+ Fetched 1 row(s) in 0.13s > select count(*) from underscore where val like '%_%';; Query: select count(*) from underscore where val like '%_%' +----------+ | count(*) | +----------+ | 0 | +----------+ Fetched 1 row(s) in 0.13s > select count(*) from underscore where val like '%has_underscore%'; Query: select count(*) from underscore where val like '%has_underscore%' +----------+ | count(*) | +----------+ | 0 | +----------+ Fetched 1 row(s) in 0.13s > select count(*) from underscore where val like '%underscore%'; Query: select count(*) from underscore where val like '%underscore%' +----------+ | count(*) | +----------+ | 3 | +----------+ Fetched 1 row(s) in 0.16s
Notice that all the queries that had an _ in them didn't work. It seems like this works with single-line text but not multi-line based on a very short experiment I did. I think it also works in non-text formats, but if you use create table as select you often wind up with text.