[KAFKA-10846] FileStreamSourceTask buffer can grow without bound - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Resolved
Priority: Major
Resolution: Fixed
Affects Version/s: None
Fix Version/s: 2.8.0, 2.7.2
Component/s: connect
Labels:
None

Description

When reading a large file the buffer used by FileStreamSourceTask can grow without bound. Even in the unit test org.apache.kafka.connect.file.FileStreamSourceTaskTest#testBatchSize the buffer grows from 1,024 to 524,288 bytes just reading 10,000 copies of a line of <100 chars.

The problem is that the condition for growing the buffer is incorrect. The buffer is doubled whenever some bytes were read and the used space in the buffer == the buffer length.
The requirement to increase the buffer size should be related to whether extractLine() actually managed to read any lines. It's only when no complete lines were read since the last call to read() that we need to increase the buffer size (to cope with the large line).

Attachments

Issue Links

links to

GitHub Pull Request #9735

Activity

People

Assignee:: Tom Bentley

Reporter:: Tom Bentley

Votes:: 0 Vote for this issue

Watchers:: 2 Start watching this issue

Dates

Created:: 11/Dec/20 15:16

Updated:: 25/May/21 10:56

Resolved:: 18/Dec/20 04:01