[HIVE-22495] Parquet count(*) read in all data - ASF JIRA

XML

Word

Printable

JSON

Running a hive query on a Parquet table

select count ( * ) from test_table

The query read in all data (all columns) instead of just metadata.

For comparison, hive 0.13 and Spark read in much less data with my test table.

Seems cause is that Parquet read support fall back to file schema if indexColumnsWanted is empty, logic still exist in master branch.

Don't know why this empty list check was added, please suggest if there're any other impact.