Details
-
Improvement
-
Status: Resolved
-
Minor
-
Resolution: Fixed
-
Impala 2.10.0
-
ghx-label-9
Description
I noticed a minor inefficiency in the handling of NeedsConversion() in the parquet scanner. In cases where it's not a runtime constant like dictionary-encoded strings and timestamps, we switch per value. This is probably only a few instructions but in this part of the code that matters.
I did a quick benchmark and saw speedups from ~2.25s->2.11s in scan time on this query:
use tpch_parquet; set num_nodes=1; set mt_dop=1; select min(l_returnflag), min(l_linestatus) from biglineitem; summary;