Uploaded image for project: 'IMPALA'
  1. IMPALA
  2. IMPALA-4231

Performance regression in TPC-H Q2 due to delay in filter arrival

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Blocker
    • Resolution: Fixed
    • Affects Version/s: Impala 2.8.0
    • Fix Version/s: Impala 2.8.0
    • Component/s: Backend
    • Labels:

      Description

      This commit: http://gerrit.cloudera.org:8080/3873 "IMPALA-3567 Part 2, IMPALA-3899: factor out PHJ builder" caused a regression in TPC-H Q2 on some systems of up to 2x.

      I was able to reproduce a regression of ~2s to ~3s locally on TPC-H scale factor 20 with 3 Impala daemons in my minicluster. I spent some time looking at profiles and the key difference seems to be that runtime filters arrived later in the scans so were ineffective at reducing the size of the join. The arrival time went from slightly under 1s to slightly over 1s.

      The regression goes away if I set:

      set RUNTIME_FILTER_WAIT_TIME_MS=1500;

        Activity

        Hide
        mmokhtar Mostafa Mokhtar added a comment -

        Should check the

        1. Codegen time in the fragment producing the Runtime filters
        2. Hash table build time in the join producing the filter

        Very likely it is one of either.

        FYI TPC-H Q2 is notoriously sensitive to runtime filter arrival time, not the first time to see and issue with filters arriving late.

        Show
        mmokhtar Mostafa Mokhtar added a comment - Should check the Codegen time in the fragment producing the Runtime filters Hash table build time in the join producing the filter Very likely it is one of either. FYI TPC-H Q2 is notoriously sensitive to runtime filter arrival time, not the first time to see and issue with filters arriving late.
        Hide
        mmokhtar Mostafa Mokhtar added a comment -

        Regression appears to be originating from increase in code-gen time

        After

               CodeGen:(Total: 1s198ms, non-child: 1s198ms, % non-child: 100.00%)
                   - CodegenTime: 3.201ms
                   - CompileTime: 386.883ms
                   - LoadTime: 0.000ns
                   - ModuleBitcodeSize: 1.87 MB (1965584)
                   - NumFunctions: 152 (152)
                   - NumInstructions: 7.96K (7956)
                   - OptimizationTime: 762.432ms
                   - PrepareTime: 48.658ms
        

        Before

                CodeGen:(Total: 839.342ms, non-child: 839.342ms, % non-child: 100.00%)
                   - CodegenTime: 2.009ms
                   - CompileTime: 260.353ms
                   - LoadTime: 0.000ns
                   - ModuleBitcodeSize: 1.87 MB (1957792)
                   - NumFunctions: 139 (139)
                   - NumInstructions: 7.74K (7738)
                   - OptimizationTime: 532.394ms
                   - PrepareTime: 46.062ms
        
        Show
        mmokhtar Mostafa Mokhtar added a comment - Regression appears to be originating from increase in code-gen time After CodeGen:(Total: 1s198ms, non-child: 1s198ms, % non-child: 100.00%) - CodegenTime: 3.201ms - CompileTime: 386.883ms - LoadTime: 0.000ns - ModuleBitcodeSize: 1.87 MB (1965584) - NumFunctions: 152 (152) - NumInstructions: 7.96K (7956) - OptimizationTime: 762.432ms - PrepareTime: 48.658ms Before CodeGen:(Total: 839.342ms, non-child: 839.342ms, % non-child: 100.00%) - CodegenTime: 2.009ms - CompileTime: 260.353ms - LoadTime: 0.000ns - ModuleBitcodeSize: 1.87 MB (1957792) - NumFunctions: 139 (139) - NumInstructions: 7.74K (7738) - OptimizationTime: 532.394ms - PrepareTime: 46.062ms
        Hide
        tarmstrong Tim Armstrong added a comment -

        There are a few things that changed with the codegen'd code in the patch that may be relevant.

        • A couple of hash functions that were previously shared between the build and probe codegen are generated separately. The final IR should be the same since they're copied during inlining, but there may be some redundant optimisation done before inlining happens.
        • The function signature for AppendRow() changed so that it returned a status
        Show
        tarmstrong Tim Armstrong added a comment - There are a few things that changed with the codegen'd code in the patch that may be relevant. A couple of hash functions that were previously shared between the build and probe codegen are generated separately. The final IR should be the same since they're copied during inlining, but there may be some redundant optimisation done before inlining happens. The function signature for AppendRow() changed so that it returned a status
        Hide
        tarmstrong Tim Armstrong added a comment -

        IMPALA-4231: fix codegen time regression

        The commit "IMPALA-3567 Part 2, IMPALA-3899: factor out PHJ builder"
        slightly increased codegen time, which caused TPC-H Q2 to sometimes
        regress significantly because of races in runtime filter arrival.

        This patch attempts to fix the regression by improving codegen time in a
        few places.

        • Revert to using the old bool/Status return pattern. The regular Status
          return pattern results in significantly more complex IR because it has
          to emit code to copy and free statuses. I spent some time trying to
          convince it to optimise the extra code out, but didn't have much success.
        • Remove some code that cannot be specialized from cross-compilation.
        • Add noexcept to some functions that are used from the IR to ensure
          exception-handling IR is not emitted. This is less important after the
          first change but still should help produce cleaner IR.

        Performance:
        I was able to reproduce a regression locally, which is fixed by this
        patch. I'm in the process of trying to verify the fix on a cluster.

        Change-Id: Idf0fdedabd488550b6db90167a30c582949d608d
        Reviewed-on: http://gerrit.cloudera.org:8080/4623
        Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com>
        Tested-by: Internal Jenkins

        Show
        tarmstrong Tim Armstrong added a comment - IMPALA-4231 : fix codegen time regression The commit " IMPALA-3567 Part 2, IMPALA-3899 : factor out PHJ builder" slightly increased codegen time, which caused TPC-H Q2 to sometimes regress significantly because of races in runtime filter arrival. This patch attempts to fix the regression by improving codegen time in a few places. Revert to using the old bool/Status return pattern. The regular Status return pattern results in significantly more complex IR because it has to emit code to copy and free statuses. I spent some time trying to convince it to optimise the extra code out, but didn't have much success. Remove some code that cannot be specialized from cross-compilation. Add noexcept to some functions that are used from the IR to ensure exception-handling IR is not emitted. This is less important after the first change but still should help produce cleaner IR. Performance: I was able to reproduce a regression locally, which is fixed by this patch. I'm in the process of trying to verify the fix on a cluster. Change-Id: Idf0fdedabd488550b6db90167a30c582949d608d Reviewed-on: http://gerrit.cloudera.org:8080/4623 Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com> Tested-by: Internal Jenkins —

          People

          • Assignee:
            tarmstrong Tim Armstrong
            Reporter:
            tarmstrong Tim Armstrong
          • Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development