Uploaded image for project: 'Apache Drill'
  1. Apache Drill
  2. DRILL-5284 Roll-up of final fixes for managed sort
  3. DRILL-5267

Managed external sort spills too often with Parquet data

    XMLWordPrintableJSON

Details

    • Sub-task
    • Status: Closed
    • Major
    • Resolution: Fixed
    • 1.10.0
    • 1.10.0
    • None
    • None

    Description

      DRILL-5266 describes how Parquet produces low-density record batches. The result of these batches is that the external sort spills more frequently than it should because it sizes spill files based on batch size, not data content of the batch. Since Parquet batches are 95% empty space, the spill files end up far too small.

      Adjust the spill calculations based on actual data content, not the size of the overall record batch.

      Attachments

        Activity

          People

            paul-rogers Paul Rogers
            paul-rogers Paul Rogers
            Rahul Kumar Challapalli Rahul Kumar Challapalli
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: