Details
-
Improvement
-
Status: Resolved
-
Major
-
Resolution: Fixed
-
None
-
None
-
ghx-label-7
Description
https://gerrit.cloudera.org/#/c/11565/ provided a clue that an aggregation involving an if() function was very slow, 10x slower than the equivalent version using a case:
[localhost:21000] default> set num_nodes=1; set mt_dop=1; select count(case when l_orderkey is NULL then 1 else NULL end) from tpch10_parquet.lineitem;summary; NUM_NODES set to 1 MT_DOP set to 1 Query: select count(case when l_orderkey is NULL then 1 else NULL end) from tpch10_parquet.lineitem Query submitted at: 2018-10-04 11:17:31 (Coordinator: http://tarmstrong-box:25000) Query progress can be monitored at: http://tarmstrong-box:25000/query_plan?query_id=274b2a6f35cefe31:95a1964200000000 +----------------------------------------------------------+ | count(case when l_orderkey is null then 1 else null end) | +----------------------------------------------------------+ | 0 | +----------------------------------------------------------+ Fetched 1 row(s) in 0.51s +--------------+--------+----------+----------+--------+------------+----------+---------------+-------------------------+ | Operator | #Hosts | Avg Time | Max Time | #Rows | Est. #Rows | Peak Mem | Est. Peak Mem | Detail | +--------------+--------+----------+----------+--------+------------+----------+---------------+-------------------------+ | 01:AGGREGATE | 1 | 44.03ms | 44.03ms | 1 | 1 | 25.00 KB | 10.00 MB | FINALIZE | | 00:SCAN HDFS | 1 | 411.57ms | 411.57ms | 59.99M | -1 | 16.61 MB | 88.00 MB | tpch10_parquet.lineitem | +--------------+--------+----------+----------+--------+------------+----------+---------------+-------------------------+ [localhost:21000] default> set num_nodes=1; set mt_dop=1; select count(if(l_orderkey is NULL, 1, NULL)) from tpch10_parquet.lineitem;summary; NUM_NODES set to 1 MT_DOP set to 1 Query: select count(if(l_orderkey is NULL, 1, NULL)) from tpch10_parquet.lineitem Query submitted at: 2018-10-04 11:23:07 (Coordinator: http://tarmstrong-box:25000) Query progress can be monitored at: http://tarmstrong-box:25000/query_plan?query_id=8e46ab1b84c4dbff:2786ca2600000000 +----------------------------------------+ | count(if(l_orderkey is null, 1, null)) | +----------------------------------------+ | 0 | +----------------------------------------+ Fetched 1 row(s) in 1.01s +--------------+--------+----------+----------+--------+------------+----------+---------------+-------------------------+ | Operator | #Hosts | Avg Time | Max Time | #Rows | Est. #Rows | Peak Mem | Est. Peak Mem | Detail | +--------------+--------+----------+----------+--------+------------+----------+---------------+-------------------------+ | 01:AGGREGATE | 1 | 422.07ms | 422.07ms | 1 | 1 | 25.00 KB | 10.00 MB | FINALIZE | | 00:SCAN HDFS | 1 | 511.13ms | 511.13ms | 59.99M | -1 | 16.61 MB | 88.00 MB | tpch10_parquet.lineitem | +--------------+--------+----------+----------+--------+------------+----------+---------------+-------------------------+
It turns out that this is because we don't have good codegen support for ConditionalFunction, and just fall back to emitting a call to the interpreted path: https://github.com/apache/impala/blob/master/be/src/exprs/conditional-functions.cc#L28
See CaseExpr for an example of much better codegen support: https://github.com/apache/impala/blob/master/be/src/exprs/case-expr.cc#L178
Attachments
Issue Links
- is part of
-
IMPALA-7747 Clean up the Expression Rewriter
- Open
- is related to
-
IMPALA-7659 Collect count of nulls when collecting stats
- Resolved