Description
When reading a large file the buffer used by FileStreamSourceTask can grow without bound. Even in the unit test org.apache.kafka.connect.file.FileStreamSourceTaskTest#testBatchSize the buffer grows from 1,024 to 524,288 bytes just reading 10,000 copies of a line of <100 chars.
The problem is that the condition for growing the buffer is incorrect. The buffer is doubled whenever some bytes were read and the used space in the buffer == the buffer length.
The requirement to increase the buffer size should be related to whether extractLine() actually managed to read any lines. It's only when no complete lines were read since the last call to read() that we need to increase the buffer size (to cope with the large line).
Attachments
Issue Links
- links to