Uploaded image for project: 'Flink'
  1. Flink
  2. FLINK-32562

FileSink Compactor Service should not use FileWriter from Sink for OutputStreamBasedFileCompactor

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Open
    • Major
    • Resolution: Unresolved
    • 1.18.0
    • None
    • None

    Description

      Gzip format is designed to be concatenatable but it will be broken by Compactor in FileSink. 

      It is because when Compactor Service create new compacted file by using GzipOutputStream, which will create extra bytes at header, which cause the final file will have extra bytes in header. (Gzip header is presented in every finished part file, we don't need an extra header in compacted file). This is because in Compactor Service, it uses the FileWriter specified in FileSink to create the compacted outputstream. I think will should use an simple bytes ouputstream to concat stream instead, or at least give a option.

       

      Currently the ConcatFileCompactor only supports pure text file. Many compressed codec support concating like gzip, bzip2. I think we should support those kind of concating, otherwise people must use RecordWiseCompactorFactor which is very ineffcient.

      Attachments

        Activity

          People

            Unassigned Unassigned
            ysn2233 Shengnan YU
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

              Created:
              Updated: