Uploaded image for project: 'Apache Drill'
  1. Apache Drill
  2. DRILL-5146

Unnecessary spilling to disk by sort when we only have 5000 rows with one column

    Details

      Description

      git.commit.id.abbrev=cf2b7c7

      The below query spills to disk for the sort. The dataset contains 5000 files and each file contains a single record.

      select * from dfs.`/drill/testdata/resource-manager/5000files/text` order by columns[1];
      

      Enviironment :

      DRILL_MAX_DIRECT_MEMORY="16G"
      DRILL_MAX_HEAP="4G"
      

      I attached the dataset, logs and the profile

        Attachments

        1. spill.log
          10.88 MB
          Rahul Challapalli
        2. data.tgz
          79 kB
          Rahul Challapalli
        3. 27a52efb-0ce6-f2ad-7216-aef007926649.sys.drill
          340 kB
          Rahul Challapalli

          Issue Links

            Activity

              People

              • Assignee:
                paul-rogers Paul Rogers
                Reporter:
                rkins Rahul Challapalli
              • Votes:
                0 Vote for this issue
                Watchers:
                3 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: