Uploaded image for project: 'IMPALA'
  1. IMPALA
  2. IMPALA-3638

Remove lazy creation of LLVM codegen module

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: Impala 2.2
    • Fix Version/s: Impala 2.8.0
    • Component/s: Backend
    • Labels:
    • Docs Text:
      This change improves performance (2x improvement in benchmark) by enabling codegen of expressions for queries with plan fragments that contain only non-codegen enabled operators such as AnalyticEvalNode.
    • Target Version:

      Description

      Query

      select *
      FROM (
        SELECT Rank() OVER (
            ORDER BY l_extendedprice
              ,l_quantity
              ,l_discount
              ,l_tax
            ) AS rank
        FROM lineitem
        WHERE l_shipdate < '1992-05-09'
        ) a
      WHERE rank < 10
      

      Plan

      03:SELECT
      |  predicates: rank() < 10
      |  hosts=20 per-host-mem=unavailable
      |  tuple-ids=6,5 row-size=66B cardinality=59999897
      |
      02:ANALYTIC
      |  functions: rank()
      |  order by: l_extendedprice ASC, l_quantity ASC, l_discount ASC, l_tax ASC
      |  window: RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
      |  hosts=20 per-host-mem=unavailable
      |  tuple-ids=6,5 row-size=66B cardinality=599998971
      |
      04:MERGING-EXCHANGE [UNPARTITIONED]
      |  order by: l_extendedprice ASC, l_quantity ASC, l_discount ASC, l_tax ASC
      |  hosts=20 per-host-mem=unavailable
      |  tuple-ids=6 row-size=58B cardinality=599998971
      |
      01:SORT
      |  order by: l_extendedprice ASC, l_quantity ASC, l_discount ASC, l_tax ASC
      |  hosts=20 per-host-mem=736.00MB
      |  tuple-ids=6 row-size=58B cardinality=599998971
      |
      00:SCAN HDFS [tpch_1000_decimal_parquet.lineitem, RANDOM]
         partitions=1/1 files=880 size=216.61GB
         predicates: l_shipdate < '1992-05-09'
         table stats: 5999989709 rows total
         column stats: all
         hosts=20 per-host-mem=440.00MB
         tuple-ids=0 row-size=58B cardinality=599998971
      

      Fragment not getting codegened

        Fragment start latencies: Count: 20, 25th %-ile: 1ms, 50th %-ile: 2ms, 75th %-ile: 2ms, 90th %-ile: 2ms, 95th %-ile: 3ms, 99.9th %-ile: 3ms
          Per Node Peak Memory Usage: d2406.halxg.cloudera.com:22000(1.38 GB) d2413.halxg.cloudera.com:22000(1.52 GB) d2405.halxg.cloudera.com:22000(1.34 GB) d2414.halxg.cloudera.com:22000(1.51 GB) d2416.halxg.cloudera.com:22000(1.48 GB) d2404.halxg.cloudera.com:22000(1.37 GB) d2420.halxg.cloudera.com:22000(1.40 GB) d2410.halxg.cloudera.com:22000(1.61 GB) d2412.halxg.cloudera.com:22000(1.22 GB) d2419.halxg.cloudera.com:22000(1.38 GB) d2409.halxg.cloudera.com:22000(1.41 GB) d2407.halxg.cloudera.com:22000(1.10 GB) d2411.halxg.cloudera.com:22000(1.34 GB) d2418.halxg.cloudera.com:22000(1.27 GB) d2408.halxg.cloudera.com:22000(1.56 GB) d2421.halxg.cloudera.com:22000(1.35 GB) d2403.halxg.cloudera.com:22000(1.28 GB) d2415.halxg.cloudera.com:22000(1.53 GB) d2417.halxg.cloudera.com:22000(1.31 GB) d2402.halxg.cloudera.com:22000(1.25 GB) 
           - FiltersReceived: 0 (0)
           - FinalizationTimer: 0.000ns
          Coordinator Fragment F01:(Total: 8m48s, non-child: 3.044ms, % non-child: 0.00%)
            MemoryUsage(16s000ms): 16.00 KB, 33.59 MB, 47.13 MB, 47.41 MB, 46.89 MB, 46.71 MB, 46.93 MB, 46.69 MB, 46.52 MB, 46.67 MB, 46.34 MB, 46.76 MB, 46.63 MB, 46.58 MB, 46.44 MB, 46.64 MB, 46.91 MB, 47.02 MB, 46.60 MB, 46.62 MB, 46.65 MB, 46.60 MB, 46.80 MB, 46.88 MB, 46.69 MB, 46.61 MB, 46.88 MB, 46.58 MB, 46.71 MB, 46.76 MB, 46.49 MB, 46.36 MB, 46.07 MB
             - AverageThreadTokens: 0.00 
             - BloomFilterBytes: 0
             - PeakMemoryUsage: 59.87 MB (62781648)
             - PerHostPeakMemUsage: 0
             - PrepareTime: 138.114us
             - RowsProduced: 9 (9)
             - TotalCpuTime: 8m49s
             - TotalNetworkReceiveTime: 0.000ns
             - TotalNetworkSendTime: 0.000ns
             - TotalStorageWaitTime: 0.000ns
            BlockMgr:
               - BlockWritesOutstanding: 0 (0)
               - BlocksCreated: 71 (71)
               - BlocksRecycled: 1.33K (1333)
               - BufferedPins: 0 (0)
               - BytesWritten: 0
               - MaxBlockSize: 8.00 MB (8388608)
               - MemoryLimit: 242.23 GB (260091396096)
               - PeakMemoryUsage: 736.00 MB (771751936)
               - TotalBufferWaitTime: 0.000ns
               - TotalEncryptionTime: 0.000ns
               - TotalIntegrityCheckTime: 0.000ns
               - TotalReadBlockTime: 0.000ns
            SELECT_NODE (id=3):(Total: 8m48s, non-child: 8s327ms, % non-child: 1.57%)
               - PeakMemoryUsage: 9.01 MB (9449472)
               - RowsReturned: 9 (9)
               - RowsReturnedRate: 0
            ANALYTIC_EVAL_NODE (id=2):(Total: 8m40s, non-child: 5m42s, % non-child: 65.78%)
               - EvaluationTime: 5m42s
               - GetNewBlockTime: 5.963ms
               - PeakMemoryUsage: 26.34 MB (27621376)
               - PinTime: 0.000ns
               - RowsReturned: 169.58M (169575660)
               - RowsReturnedRate: 325.75 K/sec
               - UnpinTime: 3.156ms
            EXCHANGE_NODE (id=4):(Total: 2m58s, non-child: 2m35s, % non-child: 87.26%)
              BytesReceived(16s000ms): 0, 26.87 MB, 135.18 MB, 253.93 MB, 373.50 MB, 493.48 MB, 613.34 MB, 733.28 MB, 852.89 MB, 972.60 MB, 1.07 GB, 1.18 GB, 1.30 GB, 1.42 GB, 1.53 GB, 1.65 GB, 1.77 GB, 1.89 GB, 2.00 GB, 2.12 GB, 2.24 GB, 2.35 GB, 2.47 GB, 2.59 GB, 2.70 GB, 2.82 GB, 2.93 GB, 3.05 GB, 3.17 GB, 3.28 GB, 3.40 GB, 3.52 GB, 3.64 GB
               - BytesReceived: 3.70 GB (3972098336)
               - ConvertRowBatchTime: 0.000ns
               - DeserializeRowBatchTimer: 15s052ms
               - FirstBatchArrivalWaitTime: 22s690ms
               - MergeGetNext: 2m35s
               - MergeGetNextBatch: 814.096ms
               - PeakMemoryUsage: 0
               - RowsReturned: 169.58M (169575660)
               - RowsReturnedRate: 952.03 K/sec
               - SendersBlockedTimer: 8m24s
               - SendersBlockedTotalTimer(*): 2h47m
      

      Wrapping the query in a count is enough to codegen the fragment with the analytic function

      select count(*) from (select *
      FROM (
        SELECT Rank() OVER (
            ORDER BY l_extendedprice
              ,l_quantity
              ,l_discount
              ,l_tax
            ) AS rank
        FROM lineitem
        WHERE l_shipdate < '1992-05-09'
        ) a
      WHERE rank < 10)a
      

      Fix should speedup the query by 2x

        Issue Links

          Activity

          Hide
          dhecht Dan Hecht added a comment -

          Tim Armstrong pointed out that we don't generally codegen the coordinator fragment, for some reason, and this is probably an example of that.

          Show
          dhecht Dan Hecht added a comment - Tim Armstrong pointed out that we don't generally codegen the coordinator fragment, for some reason, and this is probably an example of that.
          Hide
          tarmstrong Tim Armstrong added a comment -

          I was curious about this so had a quick look at the code. I think that more precisely this affects fragments that have no HDFS scans and no codegen'd operators, only exprs. By default, codegen is off. If an operator calls GetCodegen(), that enables full codegen, while if a HDFS scan node exists in the fragment, SetCodegenExpr() is called, which enables expr codegen based on a planner decision.

          Show
          tarmstrong Tim Armstrong added a comment - I was curious about this so had a quick look at the code. I think that more precisely this affects fragments that have no HDFS scans and no codegen'd operators, only exprs. By default, codegen is off. If an operator calls GetCodegen(), that enables full codegen, while if a HDFS scan node exists in the fragment, SetCodegenExpr() is called, which enables expr codegen based on a planner decision.
          Hide
          kwho Michael Ho added a comment -

          With the lazy materialization of LLVM bitcode, may be it's no longer worth the trouble of lazily creating the LLVM module which leads to problem like this bug and IMPALA-1755.

          Show
          kwho Michael Ho added a comment - With the lazy materialization of LLVM bitcode, may be it's no longer worth the trouble of lazily creating the LLVM module which leads to problem like this bug and IMPALA-1755 .
          Hide
          kwho Michael Ho added a comment -

          And looking at the original plan Mostafa filed,we should codegen EvalConjuncts() in the SelectNode too.

          Show
          kwho Michael Ho added a comment - And looking at the original plan Mostafa filed,we should codegen EvalConjuncts() in the SelectNode too.
          Hide
          tarmstrong Tim Armstrong added a comment -

          Also note that the original intention of the lazy codegen creation, which is to avoid codegening conjuncts for small parquet scans, doesn't really apply since we're doing codegen for parquet anyway.

          Show
          tarmstrong Tim Armstrong added a comment - Also note that the original intention of the lazy codegen creation, which is to avoid codegening conjuncts for small parquet scans, doesn't really apply since we're doing codegen for parquet anyway.
          Hide
          kwho Michael Ho added a comment -

          https://github.com/apache/incubator-impala/commit/b15d992abe09bc841f6e2112d47099eb15f8454f

          IMPALA-4080, IMPALA-3638: Introduce ExecNode::Codegen()
          This patch is mostly mechanical move of codegen related logic
          from each exec node's Prepare() to its Codegen() function.
          After this change, code generation will no longer happen in
          Prepare(). Instead, it will happen after Prepare() completes in
          PlanFragmentExecutor. This is an intermediate step towards the
          final goal of sharing compiled code among fragment instances in
          multi-threading.

          As part of the clean up, this change also removes the logic for
          lazy codegen object creation. In other words, if codegen is enabled,
          the codegen object will always be created. This simplifies some
          of the logic in ScalarFnCall::Prepare() and various Codegen()
          functions by reducing error checking needed. This change also
          removes the logic added for tackling IMPALA-1755 as it's not
          needed anymore after the clean up.

          The clean up also rectifies a not so well documented situation.
          Previously, even if a user explicitly sets DISABLE_CODEGEN to true,
          we may still codegen a UDF if it was written in LLVM IR or if it
          has more than 8 arguments. This patch enforces the query option
          by failing the query in both cases. To run the query, the user
          must enable codegen. This change also extends the number of
          arguments supported in the interpretation path of ScalarFn to 20.

          Change-Id: I207566bc9f4c6a159271ecdbc4bbdba3d78c6651
          Reviewed-on: http://gerrit.cloudera.org:8080/4651
          Reviewed-by: Michael Ho <kwho@cloudera.com>
          Tested-by: Internal Jenkins

          Show
          kwho Michael Ho added a comment - https://github.com/apache/incubator-impala/commit/b15d992abe09bc841f6e2112d47099eb15f8454f IMPALA-4080 , IMPALA-3638 : Introduce ExecNode::Codegen() This patch is mostly mechanical move of codegen related logic from each exec node's Prepare() to its Codegen() function. After this change, code generation will no longer happen in Prepare(). Instead, it will happen after Prepare() completes in PlanFragmentExecutor. This is an intermediate step towards the final goal of sharing compiled code among fragment instances in multi-threading. As part of the clean up, this change also removes the logic for lazy codegen object creation. In other words, if codegen is enabled, the codegen object will always be created. This simplifies some of the logic in ScalarFnCall::Prepare() and various Codegen() functions by reducing error checking needed. This change also removes the logic added for tackling IMPALA-1755 as it's not needed anymore after the clean up. The clean up also rectifies a not so well documented situation. Previously, even if a user explicitly sets DISABLE_CODEGEN to true, we may still codegen a UDF if it was written in LLVM IR or if it has more than 8 arguments. This patch enforces the query option by failing the query in both cases. To run the query, the user must enable codegen. This change also extends the number of arguments supported in the interpretation path of ScalarFn to 20. Change-Id: I207566bc9f4c6a159271ecdbc4bbdba3d78c6651 Reviewed-on: http://gerrit.cloudera.org:8080/4651 Reviewed-by: Michael Ho <kwho@cloudera.com> Tested-by: Internal Jenkins

            People

            • Assignee:
              kwho Michael Ho
              Reporter:
              mmokhtar Mostafa Mokhtar
            • Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development