Uploaded image for project: 'Kafka'
  1. Kafka
  2. KAFKA-9546

Make FileStreamSourceTask extendable with generic streams

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Resolved
    • Major
    • Resolution: Won't Fix
    • None
    • None
    • connect

    Description

      Use case: I want to read a ZIP compressed text file with a file connector and send it to Kafka.

      Currently, we have FileStreamSourceConnector which reads a \n delimited text file. This connector always returns a task of type FileStreamSourceTask.

      The FileStreamSourceTask reads from stdio or opens a file InputStream. The issue with this approach is that the input needs to be a text file, otherwise it won't work. 

      The code should be modified so that users could change the default InputStream to eg. ZipInputStream, or any other format. The code is currently written in such a way that it's not possible to extend it, we cannot use a different input stream. 

      See example here where the code got copy-pasted just so it could read from a ZstdInputStream (which reads ZSTD compressed files): https://github.com/gcsaba2/kafka-zstd/tree/master/src/main/java/org/apache/kafka/connect/file

       

      I suggest 2 changes:

      1. FileStreamSourceConnector should be extendable to return tasks of different types. These types would be input by the user through the configuration map
      2. FileStreamSourceTask should be modified so it could be extended and child classes could define different input streams.

      Attachments

        Issue Links

          Activity

            People

              galyo Csaba Galyo
              galyo Csaba Galyo
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Time Tracking

                  Estimated:
                  Original Estimate - 4h
                  4h
                  Remaining:
                  Remaining Estimate - 4h
                  4h
                  Logged:
                  Time Spent - Not Specified
                  Not Specified