Uploaded image for project: 'IMPALA'
  1. IMPALA
  2. IMPALA-1430

Codegen all aggregate functions, including UDAs

    XMLWordPrintableJSON

Details

    Description

      Currently codegen is disabled for the entire aggregation operator if a single aggregate function can't be codegen'd. We should address this by making it so all aggregate functions can be codegen'd, including UDAs. For UDAs in .so's, the codegen'd function will call into the UDA library. This also affects aggregation operator on timestamp.

      This perf hit can be especially bad for COMPUTE STATS which is heavily CPU bound on the aggregation and because there is no easy way to exclude the TIMESTAMP columns when computing the column stats (i.e., there is no simple workaround).

      Even if the portions involving TIMESTAMP cannot be codegen'd it would still be worthwhile to come up with a workaround where codegen for the other types is still enabled.

      Workaround
      If you are experiencing very slow COMPUTE STATS due to this issue, then you may be able to temporarily ALTER the TIMESTAMP columns to STRING or INT type before running COMPUTE STATS. After the command completed, the columns can be altered back to TIMESTAMP.
      Note the workaround is only apply to text data, not parquet data. parquet require compatibles data type. TIMESTAMP is INT96, it's not compatible with STRING or BIGINT.

      Attachments

        Issue Links

          Activity

            People

              tarmstrong Tim Armstrong
              skye Skye Wanderman-Milne
              Votes:
              1 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: