Uploaded image for project: 'Apache Drill'
  1. Apache Drill
  2. DRILL-5273

CompliantTextReader exhausts 4 GB memory when reading 5000 small files

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Closed
    • Major
    • Resolution: Fixed
    • 1.10.0
    • 1.10.0
    • None

    Description

      A test case was created that consists of 5000 text files, each with a single line with the file number: 1 to 5001. Each file has a single record, and at most 4 characters per record.

      Run the following query:

      SELECT * FROM `dfs.data`.`5000files/text
      

      The query will fail with an OOM in the scan batch on around record 3700 on a Mac with 4GB of direct memory.

      The code to read records in

      {ScanBatch}

      is complex. The following appears to occur:

      • Iterate over the record readers for each file.
      • For each, call setup

      The setup code is:

        public void setup(OperatorContext context, OutputMutator outputMutator) throws ExecutionSetupException {
      
          oContext = context;
          readBuffer = context.getManagedBuffer(READ_BUFFER);
          whitespaceBuffer = context.getManagedBuffer(WHITE_SPACE_BUFFER);
      

      The two buffers are in direct memory. There is no code that releases the buffers.

      The sizes are:

        private static final int READ_BUFFER = 1024*1024;
        private static final int WHITE_SPACE_BUFFER = 64*1024;
      
      = 1,048,576 + 65536 = 1,114,112
      

      This is exactly the amount of memory that accumulates per call to ScanBatch.next()

      Ctor: 0  -- Initial memory in constructor
      Init setup: 1114112  -- After call to first record reader setup
      Entry Memory: 1114112  -- first next() call, returns one record
      Entry Memory: 1114112  -- second next(), eof and start second reader
      Entry Memory: 2228224 -- third next(), second reader returns EOF
      ...
      

      If we leak 1 MB per file, with 5000 files we would leak 5 GB of memory, which would explain the OOM when given only 4 GB.

      Attachments

        Issue Links

          Activity

            People

              paul-rogers Paul Rogers
              paul-rogers Paul Rogers
              Kunal Khatua Kunal Khatua
              Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: