Details
-
Improvement
-
Status: Closed
-
Major
-
Resolution: Duplicate
-
None
-
None
-
None
Description
When performing an inner join using the hash join algorithm, it is more efficient to load the smaller table into memory and then stream the larger table.
We should the statistics made available in https://issues.apache.org/jira/browse/ARROW-10781 to build an optimizer rule to determine the smaller side of a join and use that as the build/hash side.
Attachments
Issue Links
- Blocked
-
ARROW-10783 [Rust] [DataFusion] Implement row count statistics for Parquet TableProvider
- Resolved
- is blocked by
-
ARROW-10781 [Rust] [DataFusion] TableProvider should provide row count statistics
- Resolved