Details
-
New Feature
-
Status: Resolved
-
Major
-
Resolution: Fixed
-
None
Description
In order to start building a cost-based optimizer, we need some statistics about data sources. The most basic statistic would be number of rows.
I propose that we add a Statistics struct that initially just makes a total row count available but that we can later extend to support more advanced statistics.
struct Statistics { row_count: Option<usize> }
We can then add a method to TableProvider:
trait TableProvider { fn statistics() -> Option<Statistics>; }
Statistics should be optional because not all data sources can provide statistics.
Attachments
Issue Links
- blocks
-
ARROW-10782 [Rust] [DataFusion] Optimize hash join to use smaller relation as build side
- Closed
- links to