Details
-
Improvement
-
Status: Closed
-
Major
-
Resolution: Fixed
-
None
-
None
Description
FilterStringColumnInList, StringColumnInList etc use CuckooSetBytes for lookup.
One option to optimize would be to add boundary conditions on "length" with the min/max length stored in the hashes (ref: https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/exec/vector/expressions/CuckooSetBytes.java#L85) . This would significantly reduce the number of hash computation that needs to happen. E.g TPCH-Q12
Attachments
Attachments
Issue Links
- is related to
-
ORC-1610 Reduce the number of hash computation in CuckooSetBytes
- Closed
- links to