Uploaded image for project: 'IMPALA'
  1. IMPALA
  2. IMPALA-7655

Codegen output for conditional functions (if,isnull, coalesce) is very suboptimal

    XMLWordPrintableJSON

    Details

    • Type: Improvement
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: None
    • Component/s: Backend

      Description

      https://gerrit.cloudera.org/#/c/11565/ provided a clue that an aggregation involving an if() function was very slow, 10x slower than the equivalent version using a case:

      [localhost:21000] default> set num_nodes=1; set mt_dop=1; select count(case when l_orderkey is NULL then 1 else NULL end) from tpch10_parquet.lineitem;summary;
      NUM_NODES set to 1
      MT_DOP set to 1
      Query: select count(case when l_orderkey is NULL then 1 else NULL end) from tpch10_parquet.lineitem
      Query submitted at: 2018-10-04 11:17:31 (Coordinator: http://tarmstrong-box:25000)
      Query progress can be monitored at: http://tarmstrong-box:25000/query_plan?query_id=274b2a6f35cefe31:95a1964200000000
      +----------------------------------------------------------+
      | count(case when l_orderkey is null then 1 else null end) |
      +----------------------------------------------------------+
      | 0                                                        |
      +----------------------------------------------------------+
      Fetched 1 row(s) in 0.51s
      +--------------+--------+----------+----------+--------+------------+----------+---------------+-------------------------+
      | Operator     | #Hosts | Avg Time | Max Time | #Rows  | Est. #Rows | Peak Mem | Est. Peak Mem | Detail                  |
      +--------------+--------+----------+----------+--------+------------+----------+---------------+-------------------------+
      | 01:AGGREGATE | 1      | 44.03ms  | 44.03ms  | 1      | 1          | 25.00 KB | 10.00 MB      | FINALIZE                |
      | 00:SCAN HDFS | 1      | 411.57ms | 411.57ms | 59.99M | -1         | 16.61 MB | 88.00 MB      | tpch10_parquet.lineitem |
      +--------------+--------+----------+----------+--------+------------+----------+---------------+-------------------------+
      [localhost:21000] default> set num_nodes=1; set mt_dop=1; select count(if(l_orderkey is NULL, 1, NULL)) from tpch10_parquet.lineitem;summary;
      NUM_NODES set to 1
      MT_DOP set to 1
      Query: select count(if(l_orderkey is NULL, 1, NULL)) from tpch10_parquet.lineitem
      Query submitted at: 2018-10-04 11:23:07 (Coordinator: http://tarmstrong-box:25000)
      Query progress can be monitored at: http://tarmstrong-box:25000/query_plan?query_id=8e46ab1b84c4dbff:2786ca2600000000
      +----------------------------------------+
      | count(if(l_orderkey is null, 1, null)) |
      +----------------------------------------+
      | 0                                      |
      +----------------------------------------+
      Fetched 1 row(s) in 1.01s
      +--------------+--------+----------+----------+--------+------------+----------+---------------+-------------------------+
      | Operator     | #Hosts | Avg Time | Max Time | #Rows  | Est. #Rows | Peak Mem | Est. Peak Mem | Detail                  |
      +--------------+--------+----------+----------+--------+------------+----------+---------------+-------------------------+
      | 01:AGGREGATE | 1      | 422.07ms | 422.07ms | 1      | 1          | 25.00 KB | 10.00 MB      | FINALIZE                |
      | 00:SCAN HDFS | 1      | 511.13ms | 511.13ms | 59.99M | -1         | 16.61 MB | 88.00 MB      | tpch10_parquet.lineitem |
      +--------------+--------+----------+----------+--------+------------+----------+---------------+-------------------------+
      

      It turns out that this is because we don't have good codegen support for ConditionalFunction, and just fall back to emitting a call to the interpreted path: https://github.com/apache/impala/blob/master/be/src/exprs/conditional-functions.cc#L28

      See CaseExpr for an example of much better codegen support: https://github.com/apache/impala/blob/master/be/src/exprs/case-expr.cc#L178

        Attachments

          Issue Links

            Activity

              People

              • Assignee:
                daniel.becker Daniel Becker
                Reporter:
                tarmstrong Tim Armstrong
              • Votes:
                0 Vote for this issue
                Watchers:
                6 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: