Details
-
Improvement
-
Status: Resolved
-
Critical
-
Resolution: Fixed
-
None
-
None
-
None
-
ghx-label-4
Description
Some BI tools generate compatible sqls to cast string to timestamp:
cast(unix_timestamp('timestr', 'fmt') as timestamp) or cast(unix_timestamp('timestr') as timestamp)
The internal type change in impala: StringVal->TimestampValue->BigIntVal→TimestampValue→TimestampVal
In earlier 2.x versions, casting TimestampValue to BigIntVal using libc functions which is more expensive than current design with CCTZ.
However, the cast to BIgIntVal seems to be redundant. We can simplify cast(unix_timestamp('timestr', 'fmt') as timestamp) to to_timestamp('timestr', 'fmt'), simplify cast(unix_timestamp('timestr') as timestamp) to cast('timestr' as timestamp).
I managed to modify the fe to support such expr rewriting.
Benefiting from the rewriting, the query time cost is reduced from 2.52s to 2.02s.
Here is the sql:
SELECT `t2`.`phy_category1_name` `d0`, SUM(`t1`.`apply_count`) `m0` FROM `tr_orders` `t1` LEFT JOIN `dim_sku` `t2` ON (`t1`.`sku_id` = `t2`.`sku_id`) WHERE (((CAST(UNIX_TIMESTAMP(`t1`.`date_id`) AS TIMESTAMP)) >= cast('2018-08-26' as timestamp)) AND ((CAST(UNIX_TIMESTAMP(`t1`.`date_id`) AS TIMESTAMP)) < cast('2019-09-02' as timestamp))) GROUP BY `t2`.`phy_category1_name` order by d0;
Query: SELECT `t2`.`phy_category1_name` `d0`, SUM(`t1`.`apply_count`) `m0` FROM `tr_orders` `t1` LEFT JOIN `dim_sku` `t2` ON (`t1`.`sku_id` = `t2`.`sku_id`) WHERE (((CAST(UNIX_TIMESTAMP(`t1`.`date_id`) AS TIMESTAMP)) >= cast('2018-08-26' as timestamp)) AND ((CAST(UNIX_TIMESTAMP(`t1`.`date_id`) AS TIMESTAMP)) < cast('2019-09-02' as timestamp))) GROUP BY `t2`.`phy_category1_name` order by d0