Details
-
Improvement
-
Status: Closed
-
Minor
-
Resolution: Invalid
-
None
-
None
-
None
Description
Currently, as far as I can tell, when you perform a `select count from dataset` in datafusion against a parquet dataset, the way this is implemented is by doing a scan on column 0, and counting up all of the rows (specifically I think it counts the # of rows in each batch).
However, for the specific case of just counting everythign in a parquet file, you can just read the rowcount from the footer metadata, so it's O(1) instead of O