Uploaded image for project: 'Apache Drill'
  1. Apache Drill
  2. DRILL-7141

Hash-Join (and Agg) should always spill to disk the least used partition

    XMLWordPrintableJSON

    Details

      Description

      When the probe-side data for a hash join is skewed, it is preferable to have the corresponding partition on the build side to be in memory. 

      Currently, with the spill-to-disk feature, the partition selected for spilling to disk is done at random. This means that a highly skewed probe-side data would also spill for lack of a corresponding hash table partition in memory. 

        Attachments

          Issue Links

            Activity

              People

              • Assignee:
                ben-zvi Boaz Ben-Zvi
                Reporter:
                kkhatua Kunal Khatua
              • Votes:
                0 Vote for this issue
                Watchers:
                1 Start watching this issue

                Dates

                • Created:
                  Updated: