Uploaded image for project: 'Apache Drill'
  1. Apache Drill
  2. DRILL-7141

Hash-Join (and Agg) should always spill to disk the least used partition

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Open
    • Major
    • Resolution: Unresolved
    • 1.15.0
    • None
    • None

    Description

      When the probe-side data for a hash join is skewed, it is preferable to have the corresponding partition on the build side to be in memory. 

      Currently, with the spill-to-disk feature, the partition selected for spilling to disk is done at random. This means that a highly skewed probe-side data would also spill for lack of a corresponding hash table partition in memory. 

      Attachments

        Issue Links

          Activity

            People

              ben-zvi Boaz Ben-Zvi
              kkhatua Kunal Khatua
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated: