Details
-
Improvement
-
Status: Open
-
Major
-
Resolution: Unresolved
-
1.18.0
-
None
-
None
Description
Gzip format is designed to be concatenatable but it will be broken by Compactor in FileSink.
It is because when Compactor Service create new compacted file by using GzipOutputStream, which will create extra bytes at header, which cause the final file will have extra bytes in header. (Gzip header is presented in every finished part file, we don't need an extra header in compacted file). This is because in Compactor Service, it uses the FileWriter specified in FileSink to create the compacted outputstream. I think will should use an simple bytes ouputstream to concat stream instead, or at least give a option.
Currently the ConcatFileCompactor only supports pure text file. Many compressed codec support concating like gzip, bzip2. I think we should support those kind of concating, otherwise people must use RecordWiseCompactorFactor which is very ineffcient.