Hadoop Common
  1. Hadoop Common
  2. HADOOP-6662

hadoop zlib compression does not fully utilize the buffer

    Details

    • Type: Improvement Improvement
    • Status: Resolved
    • Priority: Major Major
    • Resolution: Duplicate
    • Affects Version/s: 0.20.2
    • Fix Version/s: None
    • Component/s: io
    • Tags:
      hadoop io compress zlib

      Description

      org.apache.hadoop.io.compress.ZlibCompressonr does not fully utilize its buffer.

      Its needesInput() return false when there is any data in its buffer (64K by default). The performance will greately degrade since an JNI call will be invoded at each time the write() method of CompressonStream is called.

        Activity

        Hide
        Hong Tang added a comment -

        This is duplicate for HADOOP-4196.

        Show
        Hong Tang added a comment - This is duplicate for HADOOP-4196 .
        Hide
        Kang Xiao added a comment -

        Thank you for your notice! However no patch attached for https://issues.apache.org/jira/browse/HADOOP-4196. HADOOP-4196 discussed more issues, and this issuse foucus on the zlib compression buffer issuse.

        Show
        Kang Xiao added a comment - Thank you for your notice! However no patch attached for https://issues.apache.org/jira/browse/HADOOP-4196 . HADOOP-4196 discussed more issues, and this issuse foucus on the zlib compression buffer issuse.
        Hide
        Kang Xiao added a comment -

        Patch attached.

        needsInput() check the uncompressedDirectBuf, if it is full return false, else copy data from saved userBuf and then recheck.

        A special case, that the input uncompressedDirectBuf is not all comsumed by zlib due to output buffer is not enough, should be respected. It may be the reason the original code just return false if uncompressedBufLen > 0.

        After JNI compress invoked, uncompressedBufLen will be set back to the remaining input data length that not consumed by zlib. So if uncompressedBufLen > 0 after deflateBytesDirect() invoked, a flag keepUncompressedBuf is setted true to indicate no input needed and compress() should be invoked again to compress the remainling input data.

        Show
        Kang Xiao added a comment - Patch attached. needsInput() check the uncompressedDirectBuf, if it is full return false, else copy data from saved userBuf and then recheck. A special case, that the input uncompressedDirectBuf is not all comsumed by zlib due to output buffer is not enough, should be respected. It may be the reason the original code just return false if uncompressedBufLen > 0. After JNI compress invoked, uncompressedBufLen will be set back to the remaining input data length that not consumed by zlib. So if uncompressedBufLen > 0 after deflateBytesDirect() invoked, a flag keepUncompressedBuf is setted true to indicate no input needed and compress() should be invoked again to compress the remainling input data.
        Hide
        Kang Xiao added a comment -

        Thanks for Hong Tang. Patch is not attached in HADOOP-4196 and the issue is still unresolved in release-0.20.2.

        Show
        Kang Xiao added a comment - Thanks for Hong Tang. Patch is not attached in HADOOP-4196 and the issue is still unresolved in release-0.20.2.
        Hide
        Hadoop QA added a comment -

        -1 overall. Here are the results of testing the latest attachment
        http://issues.apache.org/jira/secure/attachment/12440187/ZlibCompressor.patch
        against trunk revision 927979.

        +1 @author. The patch does not contain any @author tags.

        -1 tests included. The patch doesn't appear to include any new or modified tests.
        Please justify why no new tests are needed for this patch.
        Also please list what manual steps were performed to verify this patch.

        -1 patch. The patch command could not apply the patch.

        Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-h4.grid.sp2.yahoo.net/434/console

        This message is automatically generated.

        Show
        Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12440187/ZlibCompressor.patch against trunk revision 927979. +1 @author. The patch does not contain any @author tags. -1 tests included. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. -1 patch. The patch command could not apply the patch. Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-h4.grid.sp2.yahoo.net/434/console This message is automatically generated.
        Hide
        Kang Xiao added a comment -

        Duplicated with HADOOP-6683.

        Show
        Kang Xiao added a comment - Duplicated with HADOOP-6683 .

          People

          • Assignee:
            Unassigned
            Reporter:
            Kang Xiao
          • Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development