Uploaded image for project: 'IMPALA'
  1. IMPALA
  2. IMPALA-3066

Break cross-compiler IR for built-in functions into multiple modules

Attach filesAttach ScreenshotVotersWatch issueWatchersCreate sub-taskLinkCloneUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Resolved
    • Minor
    • Resolution: Won't Fix
    • Impala 2.5.0
    • None
    • Backend

    Description

      We currently include all functions and cross-compiled IR into a single LLVM bitcode module that is parsed and loaded into memory for every query. We eliminate dead code early in optimisation process, but we still pay the cost for building and walking the in-memory IR for functions that are not used by the query. ~100ms of CodeGen is PrepareTime for the module.

      A lot of the module is infrequently used functions:

      tarmstrong@tarmstrong-box:~/Impala/Impala$ llvm-dis llvm-ir/impala-sse.bc
      tarmstrong@tarmstrong-box:~/Impala/Impala$ grep 'Reservoir' llvm-ir/impala-sse.ll  | wc -l3705
      tarmstrong@tarmstrong-box:~/Impala/Impala$ grep 'Timestamp' llvm-ir/impala-sse.ll  | wc -l 
      3015
      tarmstrong@tarmstrong-box:~/Impala/Impala$ grep 'boost' llvm-ir/impala-sse.ll  | wc -l
      13240
      

      We already have a mechanism for loading LLVM bitcode on demand for UDFs when they are referenced by a query. We should split out built-in functions into multiple LLVM modules and only load them when required for the query.

      Attachments

        Activity

          This comment will be Viewable by All Users Viewable by All Users
          Cancel

          People

            tarmstrong Tim Armstrong
            tarmstrong Tim Armstrong
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Slack

                Issue deployment