Uploaded image for project: 'IMPALA'
  1. IMPALA
  2. IMPALA-4164

Codegen does not generate target-specific machine code for cross-compiled functions

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Minor
    • Resolution: Fixed
    • Affects Version/s: Impala 2.8.0
    • Fix Version/s: Impala 2.9.0
    • Component/s: Backend
    • Labels:

      Description

      I discovered this while playing around with different inlining settings. It turns out that clang's LLVM IR output has hidden "target-cpu" and "target-features" function attributes that force the functions to be compiled for generic x86-64 machines (since that is what we asked clang to compile for). These are not overridden at any point.

        Issue Links

          Activity

          Hide
          tarmstrong Tim Armstrong added a comment -

          This also prevents LLVM's inliner from being effective, since it cannot inline functions with different target attributes unless the AlwaysInline attribute is present.

          Show
          tarmstrong Tim Armstrong added a comment - This also prevents LLVM's inliner from being effective, since it cannot inline functions with different target attributes unless the AlwaysInline attribute is present.
          Hide
          tarmstrong Tim Armstrong added a comment -

          Michael mentioned he was looking at this.

          Show
          tarmstrong Tim Armstrong added a comment - Michael mentioned he was looking at this.
          Hide
          kwho Michael Ho added a comment -

          IMPALA-4164: Avoid overly aggressive inlining in LLVM IR
          When generating IR functions during codegen, we used to
          always tag the functions with the "AlwaysInline" attribute.
          That potentially leads to excessive inlining, causing very
          long optimization / compilation time with marginal performance
          benefit at runtime. One of the reasons for doing it was that
          the "target-cpu" and "target-features" attributes were
          missing in the generated IR functions so the LLVM inliner
          considers them incompatible with the cross-compiled functions.
          As a result, the inliner will not inline the generated IR
          functions into cross-compiled functions and vice versa unless
          the "AlwaysInline" attributes exist.

          This change fixes the problem above by setting the "target-cpu"
          and "target-features" attributes of all IR functions to match
          that of of the host's CPUs so both generated IR functions and
          cross-compiled functions will have the same values for those
          attributes. With these attributes set, we now rely on the
          inliner of LLVM to determine whether a function is worth being
          inlined. With this change, the codegen time of a query with very
          long predicate went from 15s to 4s and the overall runtime went
          from 19s to 8s.

          Change-Id: I2d87ae8d222b415587e7320cb9072e4a8d6615ce
          Reviewed-on: http://gerrit.cloudera.org:8080/6941
          Reviewed-by: Michael Ho <kwho@cloudera.com>
          Tested-by: Impala Public Jenkins

          Show
          kwho Michael Ho added a comment - IMPALA-4164 : Avoid overly aggressive inlining in LLVM IR When generating IR functions during codegen, we used to always tag the functions with the "AlwaysInline" attribute. That potentially leads to excessive inlining, causing very long optimization / compilation time with marginal performance benefit at runtime. One of the reasons for doing it was that the "target-cpu" and "target-features" attributes were missing in the generated IR functions so the LLVM inliner considers them incompatible with the cross-compiled functions. As a result, the inliner will not inline the generated IR functions into cross-compiled functions and vice versa unless the "AlwaysInline" attributes exist. This change fixes the problem above by setting the "target-cpu" and "target-features" attributes of all IR functions to match that of of the host's CPUs so both generated IR functions and cross-compiled functions will have the same values for those attributes. With these attributes set, we now rely on the inliner of LLVM to determine whether a function is worth being inlined. With this change, the codegen time of a query with very long predicate went from 15s to 4s and the overall runtime went from 19s to 8s. Change-Id: I2d87ae8d222b415587e7320cb9072e4a8d6615ce Reviewed-on: http://gerrit.cloudera.org:8080/6941 Reviewed-by: Michael Ho <kwho@cloudera.com> Tested-by: Impala Public Jenkins

            People

            • Assignee:
              kwho Michael Ho
              Reporter:
              tarmstrong Tim Armstrong
            • Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development