[DRILL-5273] CompliantTextReader exhausts 4 GB memory when reading 5000 small files - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Closed
Priority: Major
Resolution: Fixed
Affects Version/s: 1.10.0
Fix Version/s: 1.10.0
Component/s: None
Labels:
- ready-to-commit

Description

A test case was created that consists of 5000 text files, each with a single line with the file number: 1 to 5001. Each file has a single record, and at most 4 characters per record.

Run the following query:

SELECT * FROM `dfs.data`.`5000files/text

The query will fail with an OOM in the scan batch on around record 3700 on a Mac with 4GB of direct memory.

The code to read records in

{ScanBatch}

is complex. The following appears to occur:

Iterate over the record readers for each file.
For each, call setup

The setup code is:

  public void setup(OperatorContext context, OutputMutator outputMutator) throws ExecutionSetupException {

    oContext = context;
    readBuffer = context.getManagedBuffer(READ_BUFFER);
    whitespaceBuffer = context.getManagedBuffer(WHITE_SPACE_BUFFER);

The two buffers are in direct memory. There is no code that releases the buffers.

The sizes are:

  private static final int READ_BUFFER = 1024*1024;
  private static final int WHITE_SPACE_BUFFER = 64*1024;

= 1,048,576 + 65536 = 1,114,112

This is exactly the amount of memory that accumulates per call to ScanBatch.next()

Ctor: 0  -- Initial memory in constructor
Init setup: 1114112  -- After call to first record reader setup
Entry Memory: 1114112  -- first next() call, returns one record
Entry Memory: 1114112  -- second next(), eof and start second reader
Entry Memory: 2228224 -- third next(), second reader returns EOF
...

If we leak 1 MB per file, with 5000 files we would leak 5 GB of memory, which would explain the OOM when given only 4 GB.

Attachments

Issue Links

links to

GitHub Pull Request #750

Activity

People

Assignee:: Paul Rogers

Reporter:: Paul Rogers

Reviewer:: Kunal Khatua

Votes:: 0 Vote for this issue

Watchers:: 5 Start watching this issue

Dates

Created:: 17/Feb/17 06:16

Updated:: 28/Mar/17 06:37

Resolved:: 25/Feb/17 07:22