[IMPALA-494] Make Parquet block size configurable - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Improvement
Status: Resolved
Priority: Major
Resolution: Fixed
Affects Version/s: Impala 1.1
Fix Version/s: Impala 1.2
Component/s: None
Labels:
None

Description

The 1GB parquet block size restricts the degree of parallelism during scan. For example, if I've a 1GB file and I'm querying 75% of the columns, then it'll have to do scan 750MB using 1 disk. On the other hand, if I'm using Seq/Snappy with 128Mb block size, I can parallelize the scan and get the result a lot faster.

Nong and I discussed this problem and a user-configurable block size came to our mind. It still require some more thought on this problem.

Attachments

Activity

People

Assignee:: Nong Li

Reporter:: Alan Choi

Votes:: 0 Vote for this issue

Watchers:: 0 Start watching this issue

Dates

Created:: 27/Jul/13 01:38

Updated:: 20/Dec/15 00:05

Resolved:: 18/Sep/13 06:40