Uploaded image for project: 'Apache Drill'
  1. Apache Drill
  2. DRILL-5211

Queries fail due to direct memory fragmentation

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Open
    • Major
    • Resolution: Unresolved
    • None
    • None
    • None
    • None

    Description

      Consider a test of the external sort as follows:

      • Direct memory: 3GB
      • Input file: 18 GB, with one Varchar column of 8K width

      The sort runs, spilling to disk. Once all data arrives, the sort beings to merge the results. But, to do that, it must first do an intermediate merge. For example, in this sort, there are 190 spill files, but only 19 can be merged at a time. (Each merge file contains 128 MB batches, and only 19 can fit in memory, giving a total footprint of 2.5 GB, well below the 3 GB limit.

      Yet, when loading batch xx, Drill fails with an OOM error. At that point, total available direct memory is 3,817,865,216. (Obtained from maxMemory in the Bits class in the JDK.)

      It appears that Drill wants to allocate 58,257,868 bytes, but the totalCapacity (again in Bits) is already 3,800,769,206, causing an OOM.

      The problem is that, at this point, the external sort should not ask the system for more memory. The allocator for the external sort is at just 1,192,350,366 before the allocation request. Plenty of spare memory should be available, released when the in-memory batches were spilled to disk prior to merging. Indeed, earlier in the run, the sort had reached a peak memory usage of 2,710,716,416 bytes. This memory should be available for reuse during merging, and is plenty sufficient to fill the particular request in question.

      Attachments

        1. ApacheDrillMemoryFragmentationBackground.pdf
          233 kB
          Paul Rogers
        2. ApacheDrillVectorSizeLimits.pdf
          158 kB
          Paul Rogers
        3. BatchSizeControl-AsBuilt.pdf
          361 kB
          Paul Rogers
        4. EnhancedScanOperator.pdf
          161 kB
          Paul Rogers
        5. ScanSchemaManagement.pdf
          163 kB
          Paul Rogers

        Issue Links

          Activity

            People

              Paul.Rogers Paul Rogers
              paul-rogers Paul Rogers
              Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

                Created:
                Updated: