Details
-
Bug
-
Status: Resolved
-
Minor
-
Resolution: Duplicate
-
Impala 1.2.1
-
None
-
CDH4.3
Impala 1.2.1
Description
Scenario:
1) Created Parquet file with AvroParquetWriter in code with 100 or so columns.
2) Created external table with Parquet against this file defined with only the first 4 columns and queried them all successfully.
3) Created second external table against this same file that was defined with the last 4 columns and the query blows up - complaining about the first column, and that wasn't even in the table definition.
[rd-namenode.explorys:21000] > select * from mytable2 limit 4;
Query: select * from mytable2 limit 4
ERROR: File hdfs://namenode:8021/user/doug.meil/parquet/mytable/regid=2/myfile.prq has an incompatible type with the table schema for column long1. Expected type: BYTE_ARRAY. Actual type: INT64
ERROR: Invalid query handle
The original Avro schema defined 'long1' like this...
{"name": "long1", "type": "long"},
The fact that the "Actual type" is INT64 seems correct - because I meant to put a long in there. Why does Impala think the expected type is a BYTE_ARRAY?
Note: summary queries (e.g., select count from mytable2) actually WORK. Go figure.
Attachments
Issue Links
- is duplicated by
-
IMPALA-2835 Hive/Impala inconsistency with parquet.column.index.access=false
- Resolved