Uploaded image for project: 'Kafka'
  1. Kafka
  2. KAFKA-10846

FileStreamSourceTask buffer can grow without bound

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • None
    • 2.8.0, 2.7.2
    • connect
    • None

    Description

      When reading a large file the buffer used by FileStreamSourceTask can grow without bound. Even in the unit test org.apache.kafka.connect.file.FileStreamSourceTaskTest#testBatchSize the buffer grows from 1,024 to 524,288 bytes just reading 10,000 copies of a line of <100 chars.

      The problem is that the condition for growing the buffer is incorrect. The buffer is doubled whenever some bytes were read and the used space in the buffer == the buffer length.
      The requirement to increase the buffer size should be related to whether extractLine() actually managed to read any lines. It's only when no complete lines were read since the last call to read() that we need to increase the buffer size (to cope with the large line).

      Attachments

        Issue Links

          Activity

            People

              tombentley Tom Bentley
              tombentley Tom Bentley
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: