Details
-
Bug
-
Status: Resolved
-
Blocker
-
Resolution: Fixed
-
Impala 2.1.1
-
None
-
None
Description
I have created a parquet table using hive and here is the table structure:
0: jdbc:hive2://shashmi-test-1.vpc.cloudera.c> show create table newparquethiveclient_impalaclient_uncompressed; +---------------------------------------------------------------------------------------------------------------------+--+ | createtab_stmt | +---------------------------------------------------------------------------------------------------------------------+--+ | CREATE TABLE `newparquethiveclient_impalaclient_uncompressed`( | | `row_id` tinyint, | | `int8_field` tinyint, | | `int16_field` smallint, | | `int32_field` int, | | `int64_field` bigint, | | `float_field` float, | | `double_field` double, | | `char_field` char(4), | | `varchar_field` varchar(4), | | `string_field` string, | | `bool_field` boolean) | | ROW FORMAT SERDE | | 'org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe' | | STORED AS INPUTFORMAT | | 'org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat' | | OUTPUTFORMAT | | 'org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat' | | LOCATION | | 'hdfs://shashmi-test-1.vpc.cloudera.com:8020/user/hive/warehouse/newparquethiveclient_impalaclient_uncompressed' | | TBLPROPERTIES ( | | 'transient_lastDdlTime'='1424742425') | +---------------------------------------------------------------------------------------------------------------------+--+ 22 rows selected (0.062 seconds)
If I select from this table using hive, I can fetch the records correctly:
0: jdbc:hive2://shashmi-test-1.vpc.cloudera.c> SELECT row_id,int8_field,int16_field,int32_field,int64_field,float_field,double_field,char_field,varchar_field,string_field,bool_field FROM newparquethiveclient_impalaclient_uncompressed ORDER BY row_id LIMIT 10; INFO : Number of reduce tasks determined at compile time: 1 INFO : In order to change the average load for a reducer (in bytes): INFO : set hive.exec.reducers.bytes.per.reducer=<number> INFO : In order to limit the maximum number of reducers: INFO : set hive.exec.reducers.max=<number> INFO : In order to set a constant number of reducers: INFO : set mapreduce.job.reduces=<number> WARN : Hadoop command-line option parsing not performed. Implement the Tool interface and execute your application with ToolRunner to remedy this. INFO : number of splits:1 INFO : Submitting tokens for job: job_1424044706702_0055 INFO : The url to track the job: http://shashmi-test-1.vpc.cloudera.com:8088/proxy/application_1424044706702_0055/ INFO : Starting Job = job_1424044706702_0055, Tracking URL = http://shashmi-test-1.vpc.cloudera.com:8088/proxy/application_1424044706702_0055/ INFO : Kill Command = /opt/cloudera/parcels/CDH-5.4.0-1.cdh5.4.0.p0.796/lib/hadoop/bin/hadoop job -kill job_1424044706702_0055 INFO : Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 1 INFO : 2015-02-23 17:47:36,477 Stage-1 map = 0%, reduce = 0% INFO : 2015-02-23 17:47:42,739 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 1.63 sec INFO : 2015-02-23 17:47:49,935 Stage-1 map = 100%, reduce = 100%, Cumulative CPU 3.01 sec INFO : MapReduce Total cumulative CPU time: 3 seconds 10 msec INFO : Ended Job = job_1424044706702_0055 +---------+-------------+--------------+--------------+-----------------------+--------------+---------------+-------------+----------------+-----------------+-------------+--+ | row_id | int8_field | int16_field | int32_field | int64_field | float_field | double_field | char_field | varchar_field | string_field | bool_field | +---------+-------------+--------------+--------------+-----------------------+--------------+---------------+-------------+----------------+-----------------+-------------+--+ | 1 | 4 | 4 | 4 | 4 | 4.0 | 4.0 | ab | cd | test string | false | | 2 | -4 | -4 | -4 | -4 | -4.0 | -4.0 | abcd | abcd | another string | false | | 3 | 0 | 0 | 0 | 0 | 0.0 | 0.0 | a | a | | true | | 4 | 127 | 32767 | 2147483647 | 9223372036854775807 | 14.0 | 14.0 | a | a | a | true | | 5 | -128 | -32768 | -2147483648 | -9223372036854775808 | NULL | NULL | null | null | null | true | +---------+-------------+--------------+--------------+-----------------------+--------------+---------------+-------------+----------------+-----------------+-------------+--+ 5 rows selected (21.385 seconds)
But when I try to select using Impala, I am getting following exception:
[shashmi-test-2.vpc.cloudera.com:21000] > SELECT row_id,int8_field,int16_field,int32_field,int64_field,float_field,double_field,char_field,varchar_field,string_field,bool_field FROM newparquethiveclient_impalaclient_uncompressed ORDER BY row_id LIMIT 10; Query: select row_id,int8_field,int16_field,int32_field,int64_field,float_field,double_field,char_field,varchar_field,string_field,bool_field FROM newparquethiveclient_impalaclient_uncompressed ORDER BY row_id LIMIT 10 WARNINGS: File 'hdfs://shashmi-test-1.vpc.cloudera.com:8020/user/hive/warehouse/newparquethiveclient_impalaclient_uncompressed/000000_0' has an incompatible type with the table schema for column 'row_id'. Expected type: INT32. Actual type: BYTE_ARRAY File 'hdfs://shashmi-test-1.vpc.cloudera.com:8020/user/hive/warehouse/newparquethiveclient_impalaclient_uncompressed/000000_0' has an incompatible type with the table schema for column 'row_id'. Expected type: INT32. Actual type: BYTE_ARRAY
However If I run show create table from Impala, it shows the right data type:
Query: show create table newparquethiveclient_impalaclient_uncompressed +-------------------------------------------------------------------------------------------------------------------------------------------------------------+ | result | +-------------------------------------------------------------------------------------------------------------------------------------------------------------+ | CREATE TABLE default.newparquethiveclient_impalaclient_uncompressed ( | | row_id TINYINT, | | int8_field TINYINT, | | int16_field SMALLINT, | | int32_field INT, | | int64_field BIGINT, | | float_field FLOAT, | | double_field DOUBLE, | | char_field CHAR(4), | | varchar_field VARCHAR(4), | | string_field STRING, | | bool_field BOOLEAN | | ) | | WITH SERDEPROPERTIES ('serialization.format'='1') | | STORED AS PARQUET | | LOCATION 'hdfs://shashmi-test-1.vpc.cloudera.com:8020/user/hive/warehouse/newparquethiveclient_impalaclient_uncompressed' | | TBLPROPERTIES ('numFiles'='1', 'COLUMN_STATS_ACCURATE'='true', 'transient_lastDdlTime'='1424742442', 'numRows'='5', 'totalSize'='1614', 'rawDataSize'='55') | +-------------------------------------------------------------------------------------------------------------------------------------------------------------+
Repro file is attached as well.