IMPALA-4164: Avoid overly aggressive inlining in LLVM IR
When generating IR functions during codegen, we used to
always tag the functions with the "AlwaysInline" attribute.
That potentially leads to excessive inlining, causing very
long optimization / compilation time with marginal performance
benefit at runtime. One of the reasons for doing it was that
the "target-cpu" and "target-features" attributes were
missing in the generated IR functions so the LLVM inliner
considers them incompatible with the cross-compiled functions.
As a result, the inliner will not inline the generated IR
functions into cross-compiled functions and vice versa unless
the "AlwaysInline" attributes exist.
This change fixes the problem above by setting the "target-cpu"
and "target-features" attributes of all IR functions to match
that of of the host's CPUs so both generated IR functions and
cross-compiled functions will have the same values for those
attributes. With these attributes set, we now rely on the
inliner of LLVM to determine whether a function is worth being
inlined. With this change, the codegen time of a query with very
long predicate went from 15s to 4s and the overall runtime went
from 19s to 8s.
Reviewed-by: Michael Ho <email@example.com>
Tested-by: Impala Public Jenkins