Impala already has a special code path for fast Parquet scans when no columns are scanned and materialized, but the performance can be significantly improved with a plan+execution change, as follows:
Instead of returning empty batches until num_rows have been returned, the Parquet scanner can populate a single slot with the num_rows from the Parquet row groups
The count local aggregation needs to be changed to a sum(num_rows_slot) aggregation.
The final distributed plan will be:
scan -> local agg with sum(num_rows_slot) -> merge agg sum(sum(num_rows_slot))
This optimization is applicable where is only a count and there are no scan predicates.