Details
-
Bug
-
Status: Patch Available
-
Major
-
Resolution: Unresolved
-
None
-
None
-
None
Description
Currently the ParquetRecordReaderWrapper still uses the readFooter API without filtering, which means it needs to read metadata about all row groups every time. This could some issues when input dataset is particularly big and has many columns.
Parquet-84 introduced another API which allows to do row group filtering on the task side. Hive should adopt this API.