Uploaded image for project: 'Apache Drill'
  1. Apache Drill
  2. DRILL-5146

Unnecessary spilling to disk by sort when we only have 5000 rows with one column

    XMLWordPrintableJSON

Details

    Description

      git.commit.id.abbrev=cf2b7c7

      The below query spills to disk for the sort. The dataset contains 5000 files and each file contains a single record.

      select * from dfs.`/drill/testdata/resource-manager/5000files/text` order by columns[1];
      

      Enviironment :

      DRILL_MAX_DIRECT_MEMORY="16G"
      DRILL_MAX_HEAP="4G"
      

      I attached the dataset, logs and the profile

      Attachments

        1. 27a52efb-0ce6-f2ad-7216-aef007926649.sys.drill
          340 kB
          Rahul Kumar Challapalli
        2. data.tgz
          79 kB
          Rahul Kumar Challapalli
        3. spill.log
          10.88 MB
          Rahul Kumar Challapalli

        Issue Links

          Activity

            People

              paul-rogers Paul Rogers
              rkins Rahul Kumar Challapalli
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: