Uploaded image for project: 'Apache Arrow'
  1. Apache Arrow
  2. ARROW-10782

[Rust] [DataFusion] Optimize hash join to use smaller relation as build side

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Closed
    • Major
    • Resolution: Duplicate
    • None
    • None
    • Rust - DataFusion
    • None

    Description

      When performing an inner join using the hash join algorithm, it is more efficient to load the smaller table into memory and then stream the larger table.

      We should the statistics made available in https://issues.apache.org/jira/browse/ARROW-10781 to build an optimizer rule to determine the smaller side of a join and use that as the build/hash side.

      Attachments

        Issue Links

          Activity

            People

              Dandandan Daniƫl Heres
              andygrove Andy Grove
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: