[IMPALA-2835] Hive/Impala inconsistency with parquet.column.index.access=false - ASF JIRA

XML

Word

Printable

JSON

Details

Type: New Feature
Status: Resolved
Priority: Critical
Resolution: Fixed
Affects Version/s: Impala 2.3.0
Fix Version/s: Impala 2.6.0
Component/s: Backend
Labels:
None
Environment:
Impala 2.3.0-cdh5.5.1 RELEASE (build 73bf5bc5afbb47aa7eab06cfbf6023ba8cb74f3c)

Target Version:

Impala 2.6.0

Description

In hive it's possible to map table columns to parquet file fields by name using

parquet.column.index.access=false

This is not possible in Impala to create a table with columns mapped by name. Also tables created by hive with parquet.column.index.access=false are not queried by impala correctly. Impala always uses index.access=true mode.

Steps to reproduce in Impala:

$ impala-shell -i localhost -d one_off

[localhost:21000] > create table parquet_table (field1 string, field2 string) stored as parquet;
Query: create table parquet_table (field1 string, field2 string) stored as parquet
Fetched 0 row(s) in 0.14s

[localhost:21000] > insert into parquet_table values (('f1', 'f2'));
Query: insert into parquet_table values (('f1', 'f2'))
Inserted 1 row(s) in 4.89s

[localhost:21000] > select * from parquet_table;
Query: select * from parquet_table
+--------+--------+
| field1 | field2 |
+--------+--------+
| f1     | f2     |
+--------+--------+
Fetched 1 row(s) in 0.26s

-- find where parquet files are in hdfs
[localhost:21000] > show files in parquet_table;
Query: show files in parquet_table
+---------------------------------------------------------------------------------------------------------------------------+------+-----------+
| path                                                                                                                      | size | partition |
+---------------------------------------------------------------------------------------------------------------------------+------+-----------+
| hdfs://nameservice01/user/hive/warehouse/one_off.db/parquet_table/bf4c8168cfac5dad-5abcf4063e6c53b7_253339204_data.0.parq | 382B |           |
+---------------------------------------------------------------------------------------------------------------------------+------+-----------+
Fetched 1 row(s) in 0.01s

-- it's in /user/hive/warehouse/one_off.db/parquet_table

[localhost:21000] > create external table parquet_subset (field2 string) 
stored as parquet 
location '/user/hive/warehouse/one_off.db/parquet_table';
Query: create external table parquet_subset (field2 string) stored as parquet location '/user/hive/warehouse/one_off.db/parquet_table'

Fetched 0 row(s) in 0.17s

[localhost:21000] > select * from parquet_subset;
Query: select * from parquet_subset
+--------+
| field2 |
+--------+
| f1     |
+--------+
Fetched 1 row(s) in 4.01s

How to create parquet_subset table with a column field2 mapped to column field2 from a parquet file?

Also I reported this issue in the forum:
http://community.cloudera.com/t5/Interactive-Short-cycle-SQL/external-table-stored-as-parquet-can-not-use-field-inside-a/m-p/36012

Attachments

Issue Links

duplicates

IMPALA-779 Incompatible type error when querying file created from AvroParquetWriter.

Resolved

is related to

IMPALA-4675 Mixed or uppercase columns are not resolved in parquet when using PARQUET_FALLBACK_SCHEMA_RESOLUTION=NAME

Resolved

Activity

People

Assignee:: Skye Wanderman-Milne

Reporter:: oleksii iepishkin

Votes:: 1 Vote for this issue

Watchers:: 6 Start watching this issue

Dates

Created:: 12/Jan/16 20:39

Updated:: 03/Feb/17 04:48

Resolved:: 04/Apr/16 17:43