[IMPALA-3841] Avoid materializing nested collections if top-level predicates already disqualify the row. - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Improvement
Status: Open
Priority: Critical
Resolution: Unresolved
Affects Version/s: Impala 2.5.0, Impala 2.6.0
Fix Version/s: None
Component/s: Backend
Labels:

Epic Link:
Complex types performance improvements
Target Version:

Product Backlog

Description

Today, we fully materialize a row before evaluating the top-level conjuncts when scanning Parquet. This includes materializing nested collections. We should avoid materializing nested collections if top-level conjuncts already discard the row. Our recent move to column-wise materialization makes this improvement feasible (~~IMPALA-2736~~).

To illustrate the problem, consider this query:

select * from customer c, c.orders o where c.id = 10

Even though we have a very selective predicate on the top-level customer, our scanner will still fully materialize all orders of all customers. The non-matches will be filtered, but we still pay the cost of materializing the orders.

The proposed improvement is to avoid materializing the orders of non-qualifying customers.

The improvement will several things:

Analyze and separate the top-level conjuncts into those that can be evaluated before materializing the nested collections and those that require nested collections to be materialized. In particular, we need to be careful with our auto-generated !empty() predicates on nested collections.
Add a new SkipValues() or similar interface to the Parquet column readers to advances the scanner without actually materializing values. If possible, we should skip entire blocks.

Attachments

Issue Links

relates to

IMPALA-2017 Lazy materialization of Parquet columns during query

Open

IMPALA-2735 Push down conjunct evaluation into Parquet column readers

Open

Activity

People

Assignee:: Unassigned

Reporter:: Alexander Behm

Votes:: 0 Vote for this issue

Watchers:: 6 Start watching this issue

Dates

Created:: 08/Jul/16 00:24

Updated:: 18/Nov/24 12:54