Details
-
Improvement
-
Status: Closed
-
Major
-
Resolution: Won't Do
-
Impala 2.2.4
-
None
-
None
Description
For highly selective query, once we can narrow it down to a single Parquet file that matches the predicate, Impala still has to process that single file. Right now, a single Parquet file generated by Impala has only 1 row group and its default size is 256MB. A row group is processed in serial cannot be parallelized at this moment. Processing a 256MB row group can takes several sec, even for a very simple predicate. To improve the response time of such highly selective query, the row group must be smaller.
Attachments
Issue Links
- is related to
-
IMPALA-5843 Use page index in Parquet files to skip pages
- Resolved