Details
-
Improvement
-
Status: Resolved
-
Major
-
Resolution: Fixed
-
None
-
None
-
None
Description
A major operation in analytics is the join. This issue concerns adding the join operation.
Given the complexity of this task, I propose starting with a sub-set of all joins, an hash join whose "ON" can only be a set of column names (i.e. no expressions).
Suggestion for DOD:
- physical plan to execute the join
- logical plan with the join
- SQL planner with the join
- tests on each of the above
One idea to perform this join in parallel is to, for each RecordBatch in the left, perform the join with a record on the right. Another way is to first perform a hash by key and sort on both sides, and then perform a "SortMergeJoin" on each of the partitions. There may be better ways to achieve this, though.
Attachments
1.
|
[Rust] [DataFusion] Add inner (hash) equijoin physical plan | Resolved | Jorge Leitão |
|
||||||||
2.
|
[Rust] [DataFusion] Implement SQL join support using explicit JOIN ON syntax | Resolved | Andy Grove |
|
||||||||
3.
|
[Rust] [DataFusion] Add join support to DataFrame and LogicalPlan | Resolved | Andy Grove |
|
||||||||
4.
|
[Rust] [DataFusion] Add join support to query planner | Resolved | Andy Grove | |||||||||
5.
|
[Rust] [DataFusion] Add SQL support for NATURAL JOIN | Closed | Unassigned | |||||||||
6.
|
[Rust] [DataFusion] Optimizer rules should work with qualified column names | Closed | Unassigned |