Details
-
New Feature
-
Status: Resolved
-
Critical
-
Resolution: Fixed
-
Impala 2.3.0
-
None
-
Impala 2.3.0-cdh5.5.1 RELEASE (build 73bf5bc5afbb47aa7eab06cfbf6023ba8cb74f3c)
Description
In hive it's possible to map table columns to parquet file fields by name using
parquet.column.index.access=false
This is not possible in Impala to create a table with columns mapped by name. Also tables created by hive with parquet.column.index.access=false are not queried by impala correctly. Impala always uses index.access=true mode.
Steps to reproduce in Impala:
$ impala-shell -i localhost -d one_off [localhost:21000] > create table parquet_table (field1 string, field2 string) stored as parquet; Query: create table parquet_table (field1 string, field2 string) stored as parquet Fetched 0 row(s) in 0.14s [localhost:21000] > insert into parquet_table values (('f1', 'f2')); Query: insert into parquet_table values (('f1', 'f2')) Inserted 1 row(s) in 4.89s [localhost:21000] > select * from parquet_table; Query: select * from parquet_table +--------+--------+ | field1 | field2 | +--------+--------+ | f1 | f2 | +--------+--------+ Fetched 1 row(s) in 0.26s
-- find where parquet files are in hdfs
[localhost:21000] > show files in parquet_table;
Query: show files in parquet_table
+---------------------------------------------------------------------------------------------------------------------------+------+-----------+
| path | size | partition |
+---------------------------------------------------------------------------------------------------------------------------+------+-----------+
| hdfs://nameservice01/user/hive/warehouse/one_off.db/parquet_table/bf4c8168cfac5dad-5abcf4063e6c53b7_253339204_data.0.parq | 382B | |
+---------------------------------------------------------------------------------------------------------------------------+------+-----------+
Fetched 1 row(s) in 0.01s
-- it's in /user/hive/warehouse/one_off.db/parquet_table
[localhost:21000] > create external table parquet_subset (field2 string) stored as parquet location '/user/hive/warehouse/one_off.db/parquet_table'; Query: create external table parquet_subset (field2 string) stored as parquet location '/user/hive/warehouse/one_off.db/parquet_table' Fetched 0 row(s) in 0.17s
[localhost:21000] > select * from parquet_subset; Query: select * from parquet_subset +--------+ | field2 | +--------+ | f1 | +--------+ Fetched 1 row(s) in 4.01s
How to create parquet_subset table with a column field2 mapped to column field2 from a parquet file?
Also I reported this issue in the forum:
http://community.cloudera.com/t5/Interactive-Short-cycle-SQL/external-table-stored-as-parquet-can-not-use-field-inside-a/m-p/36012
Attachments
Issue Links
- duplicates
-
IMPALA-779 Incompatible type error when querying file created from AvroParquetWriter.
- Resolved
- is related to
-
IMPALA-4675 Mixed or uppercase columns are not resolved in parquet when using PARQUET_FALLBACK_SCHEMA_RESOLUTION=NAME
- Resolved