[IMPALA-6625] Skip dictionary and collection conjunct assignment for non-Parquet scans. - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Improvement
Status: Resolved
Priority: Critical
Resolution: Fixed
Affects Version/s: Impala 2.9.0, Impala 2.10.0, Impala 2.11.0
Fix Version/s: Impala 2.13.0, Impala 3.1.0
Component/s: Frontend
Labels:
- perf
- planner

Epic Color:
ghx-label-5

Description

In HdfsScanNode.init() we try to assign dictionary and collection conjuncts even for non-Parquet scans. Such predicates only make sense for Parquet scans, so there is no point in collecting them for other scans.

The current behavior is undesirable because:

init() can be substantially slower because assigning dictionary filters may involve evaluating exprs in the BE which can be expensive
the explain plan of non-Parquet scans may have a section "parquet dictionary predicates" which is confusing/misleading

Relevant code snippet from HdfsScanNode:

@Override
  public void init(Analyzer analyzer) throws ImpalaException {
    conjuncts_ = orderConjunctsByCost(conjuncts_);
    checkForSupportedFileFormats();

    assignCollectionConjuncts(analyzer);
    computeDictionaryFilterConjuncts(analyzer);

    // compute scan range locations with optional sampling
    Set<HdfsFileFormat> fileFormats = computeScanRangeLocations(analyzer);
...
    if (fileFormats.contains(HdfsFileFormat.PARQUET)) { <--- assignment should go in here
      computeMinMaxTupleAndConjuncts(analyzer);
    }
...
}

Attachments

Issue Links

is blocked by

IMPALA-6617 Preconditions.checkState(val.getColValsSize() == 1); in EvalExprWithoutRow()

Resolved

Activity

People

Assignee:: Pooja Nilangekar

Reporter:: Alexander Behm

Votes:: 0 Vote for this issue

Watchers:: 4 Start watching this issue

Dates

Created:: 08/Mar/18 00:11

Updated:: 15/Mar/19 02:21

Resolved:: 07/Jul/18 05:05