Uploaded image for project: 'Hadoop HDFS'
  1. Hadoop HDFS
  2. HDFS-15617

zlib compression does not honor file.io.buffer.size

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Open
    • Major
    • Resolution: Unresolved
    • 3.1.0, 3.2.0, 3.3.0
    • None
    • configuration
    • None

    Description

      Working with HDFS and zlib compression, I am trying to change the buffer size passed into the top of the libz.so implementation.

       

      Our understanding is this should be changed with the parameter io.file.buffer.size.  The default is 64K and no matter how we change this parameter the buffer passed to libz.so is set to 64k.  At present, the io.file.buffer.size seems to control only CompressorStream buffer size but divides that into 64KB size buffers and sends only 64KB for compression. We should allow that to be controlled by io.file.buffer.size or else provide another parameter to control that We found in ZlibCompressor.java the following constructor was being called

       

        public ZlibCompressor(Configuration conf) {

          this(ZlibFactory.getCompressionLevel(conf),

               ZlibFactory.getCompressionStrategy(conf),

               CompressionHeader.DEFAULT_HEADER,

               DEFAULT_DIRECT_BUFFER_SIZE);

       

      DEFAULT_DIRECT_BUFFER_SIZE is set to 64 * 1024.  That said when we changed this constant, the value passed to libz.so was changed.

       

      I believe the correct final line should be conf.getInt("io.file.buffer.size", DEFAULT_DIRECT_BUFFER_SIZE));

       

      possibly use io.compression.codec.zstd.buffersize and  IO_COMPRESSION_CODEC_ZSTD_BUFFER_SIZE_DEFAULT or does that control something else?

       

      It looks like snappy correctly uses a configuration parameter:

      (SnappyCodec.java)

          int bufferSize = conf.getInt(

              CommonConfigurationKeys.IO_COMPRESSION_CODEC_SNAPPY_BUFFERSIZE_KEY,

              CommonConfigurationKeys.IO_COMPRESSION_CODEC_SNAPPY_BUFFERSIZE_DEFAULT);

      Attachments

        Activity

          People

            Unassigned Unassigned
            bstrahm Bill Strahm
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated:

              Time Tracking

                Estimated:
                Original Estimate - 48h
                48h
                Remaining:
                Remaining Estimate - 48h
                48h
                Logged:
                Time Spent - Not Specified
                Not Specified