Details
-
Improvement
-
Status: Resolved
-
Critical
-
Resolution: Fixed
-
Impala 2.9.0, Impala 2.10.0, Impala 2.11.0
-
ghx-label-5
Description
In HdfsScanNode.init() we try to assign dictionary and collection conjuncts even for non-Parquet scans. Such predicates only make sense for Parquet scans, so there is no point in collecting them for other scans.
The current behavior is undesirable because:
- init() can be substantially slower because assigning dictionary filters may involve evaluating exprs in the BE which can be expensive
- the explain plan of non-Parquet scans may have a section "parquet dictionary predicates" which is confusing/misleading
Relevant code snippet from HdfsScanNode:
@Override public void init(Analyzer analyzer) throws ImpalaException { conjuncts_ = orderConjunctsByCost(conjuncts_); checkForSupportedFileFormats(); assignCollectionConjuncts(analyzer); computeDictionaryFilterConjuncts(analyzer); // compute scan range locations with optional sampling Set<HdfsFileFormat> fileFormats = computeScanRangeLocations(analyzer); ... if (fileFormats.contains(HdfsFileFormat.PARQUET)) { <--- assignment should go in here computeMinMaxTupleAndConjuncts(analyzer); } ... }
Attachments
Issue Links
- is blocked by
-
IMPALA-6617 Preconditions.checkState(val.getColValsSize() == 1); in EvalExprWithoutRow()
- Resolved