[IMPALA-5437] Codegen for Trunc() of timestamp takes far too long - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Resolved
Priority: Major
Resolution: Fixed
Affects Version/s: Impala 2.5.0, Impala 2.6.0, Impala 2.7.0, Impala 2.8.0, Impala 2.9.0
Fix Version/s: Impala 2.10.0
Component/s: Backend
Labels:
- codegen

Target Version:

Impala 2.10.0
Epic Color:
ghx-label-6

Description

The following query takes longer to run than it should because of codegen time.

use functional_parquet;
   SELECT
   t_9.c_5,
   MIN(t_9.c_6),
   MIN(t_9.c_6),
   MIN(t_9.c_6)
FROM
(
      SELECT
         at1.string_col c_5,
         SUM(at1.int_col) c_6
      FROM
         alltypes at1
      JOIN
      (
         SELECT
            t_4.c_1 cot1,
            MIN(t_4.c_2) imp2
         FROM
         (
               SELECT
                  at2.string_col c_1,
                  SUM(at2.bigint_col) c_2
               FROM
                  alltypesagg at2
               GROUP BY
                  1
         ) t_4
         GROUP BY
            1
         ORDER BY
            2 DESC NULLS LAST
         LIMIT 100
      ) t_8
      ON
         t_8.cot1 = at1.string_col
      WHERE
         CASE WHEN    (DATEDIFF(TO_DATE(TRUNC(at1.timestamp_col, 'MM')), TO_DATE(TRUNC(TRUNC(at1.timestamp_col, 'MM'), 'YY'))) + 1) >= 152
AND
   YEAR(at1.timestamp_col) >= 2017 THEN 0 ELSE 1 END = 1
      AND
         at1.timestamp_col >= CAST('2014-01-01 00:00:00' AS TIMESTAMP)
      AND
         CASE WHEN TRUNC(at1.timestamp_col, 'MM') = CAST(CAST('2014-01-01' AS TIMESTAMP) AS TIMESTAMP) THEN at1.timestamp_col ELSE CAST(NULL AS TIMESTAMP) END >= CAST('2014-01-01 00:00:00' AS TIMESTAMP)
      GROUP BY
         1
) t_9
GROUP BY 1;

The size of the codegen module is excessive compared to the complexity of the query:

        CodeGen:(Total: 1s402ms, non-child: 1s402ms, % non-child: 100.00%)
           - CodegenTime: 6.117ms
           - CompileTime: 465.399ms
           - LoadTime: 0.000ns
           - ModuleBitcodeSize: 1.98 MB (2077396)
           - NumFunctions: 194 (194)
           - NumInstructions: 4.01K (4011)
           - OptimizationTime: 890.290ms
           - PeakMemoryUsage: 1.96 MB (2053632)
           - PrepareTime: 21.772ms

I dug into what was happening and the module is full of boost timestamp functions. It looks like most of this is coming from our implementation of Trunc(), which switches between a large number of different implementations.

I moved Trunc() out of the cross-compiled IR and codegen time was more sensible:

        CodeGen:(Total: 357.087ms, non-child: 357.087ms, % non-child: 100.00%)
           - CodegenTime: 2.748ms
           - CompileTime: 83.765ms
           - LoadTime: 0.000ns
           - ModuleBitcodeSize: 1.97 MB (2066128)
           - NumFunctions: 149 (149)
           - NumInstructions: 2.22K (2218)
           - OptimizationTime: 245.623ms
           - PeakMemoryUsage: 1.08 MB (1135616)
           - PrepareTime: 20.145ms

Attachments

Activity

People

Assignee:: Tim Armstrong

Reporter:: Tim Armstrong

Votes:: 0 Vote for this issue

Watchers:: 1 Start watching this issue

Dates

Created:: 05/Jun/17 18:07

Updated:: 06/Jun/17 02:49

Resolved:: 06/Jun/17 02:49