Details

    • Type: Improvement
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: Impala 2.6.0
    • Fix Version/s: None
    • Component/s: Backend

      Description

      LLVM supports on demand materialization of the bitcode in a module. This helps reduce the preparation time especially for short running queries. Currently, prepare time is a fixed cost of about 140+ ms on my dev box. Lazy materialization can drive this down to 20ms ~ 50ms. This also reduces the time spent in dead code elimination.

        Activity

        Hide
        kwho Michael Ho added a comment -

        IMPALA-3674: Lazy materialization of LLVM module bitcode.

        Previously, each fragment using dynamic code generation will
        parse the bitcode module and populate the LLVM data structures
        for all the functions and their bodies in the bitcode module.
        This is wasteful as we may only use a few functions out of all
        the functions parsed. We rely on dead code elimination to
        delete most of the unused functions so we won't waste time
        compiling them.

        This change implements lazy materialization of the functions'
        bodies. On the initial parse of the bitcode module, we just
        create the Function objects for each function in the module.
        The functions' bodies will be materialized on demand from the
        bitcode module when they are actually referenced in the query.
        This ensures that the prepare time during codegen is proportional
        to the number of IR functions referenced by the query instead
        of being proportional to the total number of IR functions in
        the module.

        This change also stops cross-compiling BufferedTupleStream::GetTupleRow()
        as there isn't much benefit for doing it. In addition, move the ctors
        and dtors of LikePredicate to the header file to avoid an unnecessary
        alias in the IR module.

        For TPCH-Q2, a fragment which only codegen 9 functions used to spend
        146ms in codegen. It now goes down to 35ms, a 76% reduction.

        CodeGen:(Total: 146.041ms, non-child: 146.041ms, % non-child: 100.00%)

        • CodegenTime: 0.000ns
        • CompileTime: 2.003ms
        • LoadTime: 0.000ns
        • ModuleBitcodeSize: 2.12 MB (2225304)
        • NumFunctions: 9 (9)
        • NumInstructions: 129 (129)
        • OptimizationTime: 29.019ms
        • PrepareTime: 114.651ms

        CodeGen:(Total: 35.288ms, non-child: 35.288ms, % non-child: 100.00%)

        • CodegenTime: 0.000ns
        • CompileTime: 1.880ms
        • LoadTime: 0.000ns
        • ModuleBitcodeSize: 2.12 MB (2221276)
        • NumFunctions: 9 (9)
        • NumInstructions: 129 (129)
        • OptimizationTime: 5.101ms
        • PrepareTime: 28.044ms

        Change-Id: I6ed7862fc5e86005ecea83fa2ceb489e737d66b2
        Reviewed-on: http://gerrit.cloudera.org:8080/3220
        Reviewed-by: Michael Ho <kwho@cloudera.com>
        Tested-by: Internal Jenkins

        Show
        kwho Michael Ho added a comment - IMPALA-3674 : Lazy materialization of LLVM module bitcode. Previously, each fragment using dynamic code generation will parse the bitcode module and populate the LLVM data structures for all the functions and their bodies in the bitcode module. This is wasteful as we may only use a few functions out of all the functions parsed. We rely on dead code elimination to delete most of the unused functions so we won't waste time compiling them. This change implements lazy materialization of the functions' bodies. On the initial parse of the bitcode module, we just create the Function objects for each function in the module. The functions' bodies will be materialized on demand from the bitcode module when they are actually referenced in the query. This ensures that the prepare time during codegen is proportional to the number of IR functions referenced by the query instead of being proportional to the total number of IR functions in the module. This change also stops cross-compiling BufferedTupleStream::GetTupleRow() as there isn't much benefit for doing it. In addition, move the ctors and dtors of LikePredicate to the header file to avoid an unnecessary alias in the IR module. For TPCH-Q2, a fragment which only codegen 9 functions used to spend 146ms in codegen. It now goes down to 35ms, a 76% reduction. CodeGen:(Total: 146.041ms, non-child: 146.041ms, % non-child: 100.00%) CodegenTime: 0.000ns CompileTime: 2.003ms LoadTime: 0.000ns ModuleBitcodeSize: 2.12 MB (2225304) NumFunctions: 9 (9) NumInstructions: 129 (129) OptimizationTime: 29.019ms PrepareTime: 114.651ms CodeGen:(Total: 35.288ms, non-child: 35.288ms, % non-child: 100.00%) CodegenTime: 0.000ns CompileTime: 1.880ms LoadTime: 0.000ns ModuleBitcodeSize: 2.12 MB (2221276) NumFunctions: 9 (9) NumInstructions: 129 (129) OptimizationTime: 5.101ms PrepareTime: 28.044ms Change-Id: I6ed7862fc5e86005ecea83fa2ceb489e737d66b2 Reviewed-on: http://gerrit.cloudera.org:8080/3220 Reviewed-by: Michael Ho <kwho@cloudera.com> Tested-by: Internal Jenkins

          People

          • Assignee:
            kwho Michael Ho
            Reporter:
            kwho Michael Ho
          • Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development