Uploaded image for project: 'Apache Arrow'
  1. Apache Arrow
  2. ARROW-9423

[Rust][DataFusion] Add join

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • None
    • None
    • Rust - DataFusion
    • None

    Description

      A major operation in analytics is the join. This issue concerns adding the join operation.

      Given the complexity of this task, I propose starting with a sub-set of all joins, an hash join whose "ON" can only be a set of column names (i.e. no expressions).

      Suggestion for DOD:

      • physical plan to execute the join
      • logical plan with the join
      • SQL planner with the join
      • tests on each of the above

      One idea to perform this join in parallel is to, for each RecordBatch in the left, perform the join with a record on the right. Another way is to first perform a hash by key and sort on both sides, and then perform a "SortMergeJoin" on each of the partitions. There may be better ways to achieve this, though.

      Attachments

        Activity

          People

            jorgecarleitao Jorge Leitão
            jorgecarleitao Jorge Leitão
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Time Tracking

                Estimated:
                Original Estimate - Not Specified
                Not Specified
                Remaining:
                Remaining Estimate - 0h
                0h
                Logged:
                Time Spent - 13h 20m
                13h 20m