[ARROW-10782] [Rust] [DataFusion] Optimize hash join to use smaller relation as build side - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Improvement
Status: Closed
Priority: Major
Resolution: Duplicate
Affects Version/s: None
Fix Version/s: None
Component/s: Rust - DataFusion
Labels:
None

External issue URL:
https://github.com/apache/arrow/issues/26723

Description

When performing an inner join using the hash join algorithm, it is more efficient to load the smaller table into memory and then stream the larger table.

We should the statistics made available in https://issues.apache.org/jira/browse/ARROW-10781 to build an optimizer rule to determine the smaller side of a join and use that as the build/hash side.

Attachments

Issue Links

Blocked

ARROW-10783 [Rust] [DataFusion] Implement row count statistics for Parquet TableProvider

Resolved

is blocked by

ARROW-10781 [Rust] [DataFusion] TableProvider should provide row count statistics

Resolved

Activity

People

Assignee:: Daniël Heres

Reporter:: Andy Grove

Votes:: 0 Vote for this issue

Watchers:: 4 Start watching this issue

Dates

Created:: 01/Dec/20 15:17

Updated:: 11/Jan/23 08:15

Resolved:: 21/Dec/20 22:38