Details
-
Improvement
-
Status: Resolved
-
Minor
-
Resolution: Won't Fix
-
Impala 2.5.0
-
None
Description
We currently include all functions and cross-compiled IR into a single LLVM bitcode module that is parsed and loaded into memory for every query. We eliminate dead code early in optimisation process, but we still pay the cost for building and walking the in-memory IR for functions that are not used by the query. ~100ms of CodeGen is PrepareTime for the module.
A lot of the module is infrequently used functions:
tarmstrong@tarmstrong-box:~/Impala/Impala$ llvm-dis llvm-ir/impala-sse.bc tarmstrong@tarmstrong-box:~/Impala/Impala$ grep 'Reservoir' llvm-ir/impala-sse.ll | wc -l3705 tarmstrong@tarmstrong-box:~/Impala/Impala$ grep 'Timestamp' llvm-ir/impala-sse.ll | wc -l 3015 tarmstrong@tarmstrong-box:~/Impala/Impala$ grep 'boost' llvm-ir/impala-sse.ll | wc -l 13240
We already have a mechanism for loading LLVM bitcode on demand for UDFs when they are referenced by a query. We should split out built-in functions into multiple LLVM modules and only load them when required for the query.