Details
-
Bug
-
Status: Resolved
-
Minor
-
Resolution: Duplicate
-
None
-
None
-
ghx-label-8
Description
IMPALA-5036 introduced an optimization to use the data stored in the Parquet RowGroup.num_rows field for count queries.
The estimate cardinality for the scan is the number of rows in the base table opposed to number of files or row groups.
+-------------------------------------------------------------------------------+
| Explain String |
+-------------------------------------------------------------------------------+
| Max Per-Host Resource Reservation: Memory=0B |
| Per-Host Resource Estimates: Memory=108.00MB |
| |
| F01:PLAN FRAGMENT [UNPARTITIONED] hosts=1 instances=1 |
| | Per-Host Resources: mem-estimate=10.00MB mem-reservation=0B |
| PLAN-ROOT SINK |
| | mem-estimate=0B mem-reservation=0B |
| | |
| 03:AGGREGATE [FINALIZE] |
| | output: count:merge(*) |
| | mem-estimate=10.00MB mem-reservation=0B spill-buffer=2.00MB |
| | tuple-ids=1 row-size=8B cardinality=1 |
| | |
| 02:EXCHANGE [UNPARTITIONED] |
| | mem-estimate=0B mem-reservation=0B |
| | tuple-ids=1 row-size=8B cardinality=1 |
| | |
| F00:PLAN FRAGMENT [RANDOM] hosts=130 instances=130 |
| Per-Host Resources: mem-estimate=98.00MB mem-reservation=0B |
| 01:AGGREGATE |
| | output: sum_init_zero(tpch_30000_parquet.lineitem.parquet-stats: num_rows) |
| | mem-estimate=10.00MB mem-reservation=0B spill-buffer=2.00MB |
| | tuple-ids=1 row-size=8B cardinality=1 |
| | |
| 00:SCAN HDFS [tpch_30000_parquet.lineitem, RANDOM] |
| partitions=2526/2526 files=28976 size=6.89TB |
| stats-rows=179999978268 extrapolated-rows=disabled |
| table stats: rows=179999978268 size=unavailable |
| column stats: all |
| mem-estimate=88.00MB mem-reservation=0B |
| tuple-ids=0 row-size=8B cardinality=179999978268 |
+-------------------------------------------------------------------------------+
+--------------+--------+----------+----------+--------+------------+-----------+---------------+-----------------------------+ | Operator | #Hosts | Avg Time | Max Time | #Rows | Est. #Rows | Peak Mem | Est. Peak Mem | Detail | +--------------+--------+----------+----------+--------+------------+-----------+---------------+-----------------------------+ | 03:AGGREGATE | 1 | 1.28ms | 1.28ms | 1 | 1 | 532.00 KB | 10.00 MB | FINALIZE | | 02:EXCHANGE | 1 | 2.56s | 2.56s | 129 | 1 | 0 B | 0 B | UNPARTITIONED | | 01:AGGREGATE | 129 | 4.89ms | 62.84ms | 129 | 1 | 20.00 KB | 10.00 MB | | | 00:SCAN HDFS | 129 | 62.44ms | 341.03ms | 28.98K | 180.00B | 1.75 MB | 88.00 MB | tpch_30000_parquet.lineitem | +--------------+--------+----------+----------+--------+------------+-----------+---------------+-----------------------------+