Details
-
Bug
-
Status: Resolved
-
Critical
-
Resolution: Fixed
-
Impala 2.3.0
Description
If I use casttochar() during a CTAS to set the type of a column, Impala considers the result to be STRING. However, somehow the length information for the CHAR results must be getting passed back and messing things up in the output. Trying to query the resulting table causes the query to hang:
[blah:21000] > create table char_types as select casttochar('hello world') as c1, casttochar('xyz') as c2, casttochar('x') as c3; Query: create table char_types as select casttochar('hello world') as c1, casttochar('xyz') as c2, casttochar('x') as c3 +-------------------+ | summary | +-------------------+ | Inserted 1 row(s) | +-------------------+ Fetched 1 row(s) in 6.89s [blah:21000] > desc char_types; Query: describe char_types +------+--------+---------+ | name | type | comment | +------+--------+---------+ | c1 | string | | | c2 | string | | | c3 | string | | +------+--------+---------+ [blah:21000] > show functions in _impala_builtins like 'casttochar'; Query: show functions in _impala_builtins like 'casttochar' +-------------+--------------------------+ | return type | signature | +-------------+--------------------------+ | CHAR(*) | casttochar(BIGINT) | | CHAR(*) | casttochar(BOOLEAN) | | CHAR(*) | casttochar(CHAR(*)) | | CHAR(*) | casttochar(DECIMAL(*,*)) | | CHAR(*) | casttochar(DOUBLE) | | CHAR(*) | casttochar(FLOAT) | | CHAR(*) | casttochar(INT) | | CHAR(*) | casttochar(SMALLINT) | | CHAR(*) | casttochar(STRING) | | CHAR(*) | casttochar(TIMESTAMP) | | CHAR(*) | casttochar(TINYINT) | | CHAR(*) | casttochar(VARCHAR(*)) | +-------------+--------------------------+ Fetched 12 row(s) in 0.10s [blah:21000] > select * from char_types; Query: select * from char_types ^C Cancelling Query
The HDFS data file has the original text info plus extra control characters. Doing hdfs dfs -cat on the data file causes the OS X terminal to go haywire and lock up.