Details
-
Bug
-
Status: Closed
-
Major
-
Resolution: Fixed
-
1.10.0
-
None
Description
A test case was created that consists of 5000 text files, each with a single line with the file number: 1 to 5001. Each file has a single record, and at most 4 characters per record.
Run the following query:
SELECT * FROM `dfs.data`.`5000files/text
The query will fail with an OOM in the scan batch on around record 3700 on a Mac with 4GB of direct memory.
The code to read records in
{ScanBatch}is complex. The following appears to occur:
- Iterate over the record readers for each file.
- For each, call setup
The setup code is:
public void setup(OperatorContext context, OutputMutator outputMutator) throws ExecutionSetupException { oContext = context; readBuffer = context.getManagedBuffer(READ_BUFFER); whitespaceBuffer = context.getManagedBuffer(WHITE_SPACE_BUFFER);
The two buffers are in direct memory. There is no code that releases the buffers.
The sizes are:
private static final int READ_BUFFER = 1024*1024; private static final int WHITE_SPACE_BUFFER = 64*1024; = 1,048,576 + 65536 = 1,114,112
This is exactly the amount of memory that accumulates per call to ScanBatch.next()
Ctor: 0 -- Initial memory in constructor Init setup: 1114112 -- After call to first record reader setup Entry Memory: 1114112 -- first next() call, returns one record Entry Memory: 1114112 -- second next(), eof and start second reader Entry Memory: 2228224 -- third next(), second reader returns EOF ...
If we leak 1 MB per file, with 5000 files we would leak 5 GB of memory, which would explain the OOM when given only 4 GB.
Attachments
Issue Links
- links to