Details
-
Sub-task
-
Status: Resolved
-
Major
-
Resolution: Fixed
-
None
Description
Here is an overview of how I think we should implement support for equijoins, at least for the initial implementation.
- Read all batches from the left-side of the join into a single Vec<RecordBatch>
- Create a map something like HashMap<Vec<ScalarValue>, Vec<(usize,usize)>> to map keys to batch/row indices
- Iterate over this Vec<RecordBatch> and create an entry in a hash map, mapping the join keys to the index of the batch and row in the Vec<RecordBatch>
- For each input partition on the right-side of the join, return an output partition that is an iterator/stream that:
- For each input row, evaluate the join keys
- Look up those join keys in the hash map
- If a match is found:
- For each (batch, row) index create an output row which has the values from both the left and right row and emit it
- If no match is found:
- Do not emit a row