[HIVE-15131] Change Parquet reader to read metadata on the task side - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Patch Available
Priority: Major
Resolution: Unresolved
Affects Version/s: None
Fix Version/s: None
Component/s: Reader
Labels:
None

Description

Currently the ParquetRecordReaderWrapper still uses the readFooter API without filtering, which means it needs to read metadata about all row groups every time. This could some issues when input dataset is particularly big and has many columns.

Parquet-84 introduced another API which allows to do row group filtering on the task side. Hive should adopt this API.

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending

HIVE-15131.1.patch
17/Jul/18 05:47
2 kB
Adesh Kumar Rao
HIVE-15131.2.patch
18/Jul/18 05:07
2 kB
Adesh Kumar Rao
HIVE-15131.3.patch
18/Jul/18 20:37
3 kB
Adesh Kumar Rao
HIVE-15131.4.patch
18/Jul/18 20:45
3 kB
Adesh Kumar Rao

Activity

People

Assignee:: Adesh Kumar Rao

Reporter:: Chao Sun

Votes:: 0 Vote for this issue

Watchers:: 2 Start watching this issue

Dates

Created:: 04/Nov/16 16:56

Updated:: 19/Jul/18 18:23