Currently we compute column statistics by relying on the compute_stats UDAF. For instance, for a given table tbl, the query to compute statistics for columns is translated internally into:
compute_stats produces data for the stats available for each column type, e.g., struct<"max":long,"min":long,"countnulls":long,...>.
This issue is to produce a query that relies purely on SQL functions instead:
This will allow us to deprecate the compute_stats UDAF since it mostly duplicates functionality found in those other functions. Additionally, many of those functions already provide a vectorized implementation so the approach can potentially improve the performance of column stats collection.