Description
ORC-742 introduces lazy evaluation of the non-filter columns in the presence of filters. This builds further on that to convert SArg into filters.
SArg to Filter
SArg to Filter converts the passed SArg into a filter. This enables automatic compatibility with both Spark and Hive as they already push down Search Arguments down to ORC.
The SArg is automatically converted into a Vector Filter. Which is applied during the read process.
The builder for search argument should allow skipping normalization during the build. This has already been proposed as part of HIVE-24458.
Normalization is very poor in performance in the presence of multilevel predicates.
Benchmark | (fSize) | (fType) | (normalize) | Mode | Cnt | Score | Error | Units |
---|---|---|---|---|---|---|---|---|
ComplexFilterBench.filter | 2 | vector | true | avgt | 20 | 74.321 | ± 0.156 | us/op |
ComplexFilterBench.filter | 2 | vector | false | avgt | 20 | 78.119 | ± 0.351 | us/op |
ComplexFilterBench.filter | 4 | vector | true | avgt | 20 | 267.405 | ± 1.202 | us/op |
ComplexFilterBench.filter | 4 | vector | false | avgt | 20 | 136.284 | ± 0.637 | us/op |
ComplexFilterBench.filter | 8 | vector | true | avgt | 20 | 9907.765 | ± 49.208 | us/op |
ComplexFilterBench.filter | 8 | vector | false | avgt | 20 | 247.714 | ± 0.651 | us/op |
Explanation:
- fSize identifies the size of the OR clause that will be normalized.
- normalize identifies whether normalize was carried out on the Search Argument.
Observations:
- Normalizing the search argument results in a significant performance penalty given the explosion of the operator tree
- In case where an AND includes 8 ORs, the unnormalized version is faster by 97.32%
Attachments
Issue Links
- causes
-
ORC-1382 Fix secondary config names `org.sarg.*` to `orc.sarg.*`
-
- Resolved
-
-
ORC-954 Fix Javadoc generation failure
-
- Closed
-
- depends upon
-
ORC-742 LazyIO of non-filter columns in the presence of filters
-
- Closed
-
-
HIVE-24458 Allow access to SArgs without converting to disjunctive normal form
-
- Closed
-
- links to