ORC-742 introduces lazy evaluation of the non-filter columns in the presence of filters. This builds further on that to convert SArg into filters.
SArg to Filter converts the passed SArg into a filter. This enables automatic compatibility with both Spark and Hive as they already push down Search Arguments down to ORC.
The SArg is automatically converted into a Vector Filter. Which is applied during the read process.
Normalization is very poor in performance in the presence of multilevel predicates.
- fSize identifies the size of the OR clause that will be normalized.
- normalize identifies whether normalize was carried out on the Search Argument.
- Normalizing the search argument results in a significant performance penalty given the explosion of the operator tree
- In case where an AND includes 8 ORs, the unnormalized version is faster by 97.32%