Details
-
Bug
-
Status: Patch Available
-
Major
-
Resolution: Unresolved
-
None
-
None
-
None
Description
Running a hive query on a Parquet table
select count ( * ) from test_table
The query read in all data (all columns) instead of just metadata.
For comparison, hive 0.13 and Spark read in much less data with my test table.
engine | HDFS data read |
---|---|
Hive 2.3.4 | 452.9 MB |
Hive 0.13 | 22.5 KB |
Spark | 41.6 KB |
Seems cause is that Parquet read support fall back to file schema if indexColumnsWanted is empty, logic still exist in master branch.
Don't know why this empty list check was added, please suggest if there're any other impact.