HBase
  1. HBase
  2. HBASE-5387

Reuse compression streams in HFileBlock.Writer

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Critical Critical
    • Resolution: Fixed
    • Affects Version/s: 0.94.0
    • Fix Version/s: 0.94.0
    • Component/s: None
    • Labels:
      None
    • Hadoop Flags:
      Reviewed

      Description

      We need to to reuse compression streams in HFileBlock.Writer instead of allocating them every time. The motivation is that when using Java's built-in implementation of Gzip, we allocate a new GZIPOutputStream object and an associated native data structure every time we create a compression stream. The native data structure is only deallocated in the finalizer. This is one suspected cause of recent TestHFileBlock failures on Hadoop QA: https://builds.apache.org/job/HBase-TRUNK/2658/testReport/org.apache.hadoop.hbase.io.hfile/TestHFileBlock/testPreviousOffset_1_/.

      1. ASF.LICENSE.NOT.GRANTED--D1725.2.patch
        15 kB
        Phabricator
      2. ASF.LICENSE.NOT.GRANTED--D1725.1.patch
        27 kB
        Phabricator
      3. Fix-deflater-leak-2012-02-12_00_37_27.patch
        13 kB
        Mikhail Bautin
      4. 5387.txt
        12 kB
        Ted Yu
      5. ASF.LICENSE.NOT.GRANTED--D1719.5.patch
        12 kB
        Phabricator
      6. ASF.LICENSE.NOT.GRANTED--D1719.4.patch
        12 kB
        Phabricator
      7. ASF.LICENSE.NOT.GRANTED--D1719.3.patch
        16 kB
        Phabricator
      8. Fix-deflater-leak-2012-02-11_17_13_10.patch
        13 kB
        Mikhail Bautin
      9. ASF.LICENSE.NOT.GRANTED--D1719.2.patch
        12 kB
        Phabricator
      10. ASF.LICENSE.NOT.GRANTED--D1719.1.patch
        10 kB
        Phabricator
      11. Fix-deflater-leak-2012-02-10_18_48_45.patch
        12 kB
        Mikhail Bautin

        Activity

        Mikhail Bautin created issue -
        Mikhail Bautin made changes -
        Field Original Value New Value
        Attachment Fix-deflater-leak-2012-02-10_18_48_45.patch [ 12514195 ]
        Ted Yu made changes -
        Status Open [ 1 ] Patch Available [ 10002 ]
        Ted Yu made changes -
        Assignee Mikhail Bautin [ mikhail ]
        Mikhail Bautin made changes -
        Description We need to to reuse compression streams in HFileBlock.Writer instead of allocating them every time. The motivation is that when using Java's built-in implementation of Gzip, we allocate a new GZIPOutputStream object and an associated native data structure any time. This is one suspected cause of recent TestHFileBlock failures on Hadoop QA: https://builds.apache.org/job/HBase-TRUNK/2658/testReport/org.apache.hadoop.hbase.io.hfile/TestHFileBlock/testPreviousOffset_1_/.
        We need to to reuse compression streams in HFileBlock.Writer instead of allocating them every time. The motivation is that when using Java's built-in implementation of Gzip, we allocate a new GZIPOutputStream object and an associated native data structure every time we create a compression stream. The native data structure is only deallocated in the finalizer. This is one suspected cause of recent TestHFileBlock failures on Hadoop QA: https://builds.apache.org/job/HBase-TRUNK/2658/testReport/org.apache.hadoop.hbase.io.hfile/TestHFileBlock/testPreviousOffset_1_/.
        Hide
        Hadoop QA added a comment -

        -1 overall. Here are the results of testing the latest attachment
        http://issues.apache.org/jira/secure/attachment/12514195/Fix-deflater-leak-2012-02-10_18_48_45.patch
        against trunk revision .

        +1 @author. The patch does not contain any @author tags.

        +1 tests included. The patch appears to include 4 new or modified tests.

        -1 javadoc. The javadoc tool appears to have generated -136 warning messages.

        +1 javac. The applied patch does not increase the total number of javac compiler warnings.

        -1 findbugs. The patch appears to introduce 157 new Findbugs (version 1.3.9) warnings.

        +1 release audit. The applied patch does not increase the total number of release audit warnings.

        -1 core tests. The patch failed these unit tests:
        org.apache.hadoop.hbase.mapreduce.TestHFileOutputFormat
        org.apache.hadoop.hbase.regionserver.TestSplitTransactionOnCluster
        org.apache.hadoop.hbase.mapred.TestTableMapReduce
        org.apache.hadoop.hbase.mapreduce.TestImportTsv

        Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/943//testReport/
        Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/943//artifact/trunk/patchprocess/newPatchFindbugsWarnings.html
        Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/943//console

        This message is automatically generated.

        Show
        Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12514195/Fix-deflater-leak-2012-02-10_18_48_45.patch against trunk revision . +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 4 new or modified tests. -1 javadoc. The javadoc tool appears to have generated -136 warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. -1 findbugs. The patch appears to introduce 157 new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed these unit tests: org.apache.hadoop.hbase.mapreduce.TestHFileOutputFormat org.apache.hadoop.hbase.regionserver.TestSplitTransactionOnCluster org.apache.hadoop.hbase.mapred.TestTableMapReduce org.apache.hadoop.hbase.mapreduce.TestImportTsv Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/943//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/943//artifact/trunk/patchprocess/newPatchFindbugsWarnings.html Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/943//console This message is automatically generated.
        Hide
        Ted Yu added a comment -

        Looks like TestHFileBlock passes this time.

        Show
        Ted Yu added a comment - Looks like TestHFileBlock passes this time.
        Hide
        Mikhail Bautin added a comment -

        @Ted: I think this addresses the root cause of TestHFileBlock and TestForceCacheImportantBlocks failures. As you suspected, Hadoop QA was pointing to a real bug in HBase. However, I think we have had this issue for a while (even in HFile v1), and it just got exposed as I increased the volume of IO happening within a single unit test. I will add a ulimit setting to our internal test runs so that we catch memory leaks like this in the future.

        Show
        Mikhail Bautin added a comment - @Ted: I think this addresses the root cause of TestHFileBlock and TestForceCacheImportantBlocks failures. As you suspected, Hadoop QA was pointing to a real bug in HBase. However, I think we have had this issue for a while (even in HFile v1), and it just got exposed as I increased the volume of IO happening within a single unit test. I will add a ulimit setting to our internal test runs so that we catch memory leaks like this in the future.
        Hide
        Lars Hofhansl added a comment -

        Patch looks good to me.
        Can you mark the overriden methods in ResetableGZIPOutputStream, ReusableGzipOutputStream, and ReusableStreamGzipCodec with @Override?

        Also, I would probably move the static code to get the GZIP_HEADER into ResetableGZIPOutputStream, and put a few more comments around it (took me a bit figure out what this was doing).

        Show
        Lars Hofhansl added a comment - Patch looks good to me. Can you mark the overriden methods in ResetableGZIPOutputStream, ReusableGzipOutputStream, and ReusableStreamGzipCodec with @Override? Also, I would probably move the static code to get the GZIP_HEADER into ResetableGZIPOutputStream, and put a few more comments around it (took me a bit figure out what this was doing).
        Hide
        Ted Yu added a comment -

        I ran TestSplitTransactionOnCluster twice on MacBook and it passed.
        I used script to check for hung test and found none.

        I think the patch, after slight revision, should be good to go.

        Thanks for the quick turnaround, Mikhail.

        I think we should try to push ReusableStreamGzipCodec upstream to Hadoop.

        Show
        Ted Yu added a comment - I ran TestSplitTransactionOnCluster twice on MacBook and it passed. I used script to check for hung test and found none. I think the patch, after slight revision, should be good to go. Thanks for the quick turnaround, Mikhail. I think we should try to push ReusableStreamGzipCodec upstream to Hadoop.
        Hide
        stack added a comment -

        Any reason for hardcoding of 32K for buffer size:

        + ((Configurable)codec).getConf().setInt("io.file.buffer.size", 32 * 1024);

        Give this an initial reasonable size?

        + compressedByteStream = new ByteArrayOutputStream();

        So, we'll keep around the largest thing we ever wrote into this ByteArrayOutputStream? Should we resize it or something from time to time? Or I suppose we can just wait till its a prob?

        Is the gzip stuff brittle? The header can be bigger than 10bytes I suppose (spec allows extensions IIRC) but I suppose its safe because we presume java or underlying native compression.

        Good stuff Mikhail. +1 on patch.

        Show
        stack added a comment - Any reason for hardcoding of 32K for buffer size: + ((Configurable)codec).getConf().setInt("io.file.buffer.size", 32 * 1024); Give this an initial reasonable size? + compressedByteStream = new ByteArrayOutputStream(); So, we'll keep around the largest thing we ever wrote into this ByteArrayOutputStream? Should we resize it or something from time to time? Or I suppose we can just wait till its a prob? Is the gzip stuff brittle? The header can be bigger than 10bytes I suppose (spec allows extensions IIRC) but I suppose its safe because we presume java or underlying native compression. Good stuff Mikhail. +1 on patch.
        Hide
        Ted Yu added a comment -

        w.r.t. the second comment above, I see the following in doCompression() :

        +        compressedByteStream.reset();
        

        According to http://docs.oracle.com/javase/6/docs/api/java/io/ByteArrayOutputStream.html#reset%28%29, after the above call, 'all currently accumulated output in the output stream is discarded.'
        The output stream can be used again, reusing the already allocated buffer space.

        Show
        Ted Yu added a comment - w.r.t. the second comment above, I see the following in doCompression() : + compressedByteStream.reset(); According to http://docs.oracle.com/javase/6/docs/api/java/io/ByteArrayOutputStream.html#reset%28%29 , after the above call, 'all currently accumulated output in the output stream is discarded.' The output stream can be used again, reusing the already allocated buffer space.
        Hide
        Ted Yu added a comment - - edited
        Show
        Ted Yu added a comment - - edited See http://crawler.archive.org/apidocs/org/archive/io/GzipHeader.html#MINIMAL_GZIP_HEADER_LENGTH for where header length came from. http://kickjava.com/src/java/util/zip/GZIPOutputStream.java.htm , line 109 gives us better idea about header length.
        Ted Yu made changes -
        Hadoop Flags Reviewed [ 10343 ]
        Fix Version/s 0.94.0 [ 12316419 ]
        Affects Version/s 0.94.0 [ 12316419 ]
        Priority Major [ 3 ] Critical [ 2 ]
        Hide
        Phabricator added a comment -

        mbautin requested code review of "[jira] HBASE-5387 Reuse compression streams in HFileBlock.Writer".
        Reviewers: tedyu, Liyin, dhruba, JIRA

        We need to to reuse compression streams in HFileBlock.Writer instead of allocating them every time. The motivation is that when using Java's built-in implementation of Gzip, we allocate a new GZIPOutputStream object and an associated native data structure any time. This is one suspected cause of recent TestHFileBlock failures on Hadoop QA: https://builds.apache.org/job/HBase-TRUNK/2658/testReport/org.apache.hadoop.hbase.io.hfile/TestHFileBlock/testPreviousOffset_1_/.

        TEST PLAN

        • Run unit tests
        • Create a GZIP-compressed CF with new code, load some data, shut down HBase, deploy old code, restart HBase, and scan the table

        REVISION DETAIL
        https://reviews.facebook.net/D1719

        AFFECTED FILES
        src/main/java/org/apache/hadoop/hbase/io/hfile/Compression.java
        src/main/java/org/apache/hadoop/hbase/io/hfile/HFileBlock.java
        src/main/java/org/apache/hadoop/hbase/io/hfile/ReusableStreamGzipCodec.java
        src/test/java/org/apache/hadoop/hbase/io/hfile/TestHFileBlock.java

        Show
        Phabricator added a comment - mbautin requested code review of " [jira] HBASE-5387 Reuse compression streams in HFileBlock.Writer". Reviewers: tedyu, Liyin, dhruba, JIRA We need to to reuse compression streams in HFileBlock.Writer instead of allocating them every time. The motivation is that when using Java's built-in implementation of Gzip, we allocate a new GZIPOutputStream object and an associated native data structure any time. This is one suspected cause of recent TestHFileBlock failures on Hadoop QA: https://builds.apache.org/job/HBase-TRUNK/2658/testReport/org.apache.hadoop.hbase.io.hfile/TestHFileBlock/testPreviousOffset_1_/ . TEST PLAN Run unit tests Create a GZIP-compressed CF with new code, load some data, shut down HBase, deploy old code, restart HBase, and scan the table REVISION DETAIL https://reviews.facebook.net/D1719 AFFECTED FILES src/main/java/org/apache/hadoop/hbase/io/hfile/Compression.java src/main/java/org/apache/hadoop/hbase/io/hfile/HFileBlock.java src/main/java/org/apache/hadoop/hbase/io/hfile/ReusableStreamGzipCodec.java src/test/java/org/apache/hadoop/hbase/io/hfile/TestHFileBlock.java
        Phabricator made changes -
        Attachment D1719.1.patch [ 12514205 ]
        Hide
        Hadoop QA added a comment -

        -1 overall. Here are the results of testing the latest attachment
        http://issues.apache.org/jira/secure/attachment/12514205/D1719.1.patch
        against trunk revision .

        +1 @author. The patch does not contain any @author tags.

        +1 tests included. The patch appears to include 3 new or modified tests.

        -1 javadoc. The javadoc tool appears to have generated -136 warning messages.

        +1 javac. The applied patch does not increase the total number of javac compiler warnings.

        -1 findbugs. The patch appears to introduce 157 new Findbugs (version 1.3.9) warnings.

        +1 release audit. The applied patch does not increase the total number of release audit warnings.

        +1 core tests. The patch passed unit tests in .

        Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/944//testReport/
        Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/944//artifact/trunk/patchprocess/newPatchFindbugsWarnings.html
        Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/944//console

        This message is automatically generated.

        Show
        Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12514205/D1719.1.patch against trunk revision . +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 3 new or modified tests. -1 javadoc. The javadoc tool appears to have generated -136 warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. -1 findbugs. The patch appears to introduce 157 new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. +1 core tests. The patch passed unit tests in . Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/944//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/944//artifact/trunk/patchprocess/newPatchFindbugsWarnings.html Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/944//console This message is automatically generated.
        Hide
        Ted Yu added a comment -

        I looked at https://reviews.facebook.net/D1719 and only found one version.
        What changes were made after first round of reviews ?

        Show
        Ted Yu added a comment - I looked at https://reviews.facebook.net/D1719 and only found one version. What changes were made after first round of reviews ?
        Hide
        stack added a comment -

        IIRC, the BAOS will keep the outline of the largest allocation that went through it – reset doesn't put the BAOS backing buffer back to original size... I haven't looked at the src... maybe its better now (I wrote that crawler gzipping thing you cite above).

        Show
        stack added a comment - IIRC, the BAOS will keep the outline of the largest allocation that went through it – reset doesn't put the BAOS backing buffer back to original size... I haven't looked at the src... maybe its better now (I wrote that crawler gzipping thing you cite above).
        Hide
        Ted Yu added a comment -

        I went through http://www.docjar.com/html/api/java/io/ByteArrayOutputStream.java.html and didn't find how to get buf.length
        size() method just returns count.

        One approach is to query BAOS.size() toward the end of doCompression(). If certain threshold is exceeded, we recreate the BAOS.

        Show
        Ted Yu added a comment - I went through http://www.docjar.com/html/api/java/io/ByteArrayOutputStream.java.html and didn't find how to get buf.length size() method just returns count. One approach is to query BAOS.size() toward the end of doCompression(). If certain threshold is exceeded, we recreate the BAOS.
        Hide
        Phabricator added a comment -

        lhofhansl has commented on the revision "[jira] HBASE-5387 Reuse compression streams in HFileBlock.Writer".

        Nice patch... Only minor comments.

        INLINE COMMENTS
        src/main/java/org/apache/hadoop/hbase/io/hfile/ReusableStreamGzipCodec.java:40 Move this to ResetableGZIPOutputStream?
        src/main/java/org/apache/hadoop/hbase/io/hfile/ReusableStreamGzipCodec.java:75 Would be nice to mark all these with @Override

        REVISION DETAIL
        https://reviews.facebook.net/D1719

        Show
        Phabricator added a comment - lhofhansl has commented on the revision " [jira] HBASE-5387 Reuse compression streams in HFileBlock.Writer". Nice patch... Only minor comments. INLINE COMMENTS src/main/java/org/apache/hadoop/hbase/io/hfile/ReusableStreamGzipCodec.java:40 Move this to ResetableGZIPOutputStream? src/main/java/org/apache/hadoop/hbase/io/hfile/ReusableStreamGzipCodec.java:75 Would be nice to mark all these with @Override REVISION DETAIL https://reviews.facebook.net/D1719
        Hide
        stack added a comment -

        @Ted Yeah, I took a look at BAOS. Reset just sets count to zero as you see. If a BAOS goes big and stays big, maybe its not so bad. If there are a bunch of these though, it could become a prob. We could do as you suggest as long as we do it before we call resize? So we'd have a conditional where if the size > N times the buffer, then instead of resusing, we go get a new BAOS?

        Show
        stack added a comment - @Ted Yeah, I took a look at BAOS. Reset just sets count to zero as you see. If a BAOS goes big and stays big, maybe its not so bad. If there are a bunch of these though, it could become a prob. We could do as you suggest as long as we do it before we call resize? So we'd have a conditional where if the size > N times the buffer, then instead of resusing, we go get a new BAOS?
        Hide
        Ted Yu added a comment -

        Something like that.

        Show
        Ted Yu added a comment - Something like that.
        Hide
        Lars Hofhansl added a comment - - edited

        We could also do something heuristically. Get a random float and if it is < 0.001 (for example, every 1000's time on average) throw away the old BAOS and get a new one.
        Can the BAOS really get > the blocksize?

        Also what happens when somebody (foolishly, but it does happen) sets the block size to 512mb, in that case we might want to free this BOAS every single time.

        Show
        Lars Hofhansl added a comment - - edited We could also do something heuristically. Get a random float and if it is < 0.001 (for example, every 1000's time on average) throw away the old BAOS and get a new one. Can the BAOS really get > the blocksize? Also what happens when somebody (foolishly, but it does happen) sets the block size to 512mb, in that case we might want to free this BOAS every single time.
        Hide
        Mikhail Bautin added a comment -

        The BAOS in question only exists during the lifetime of the writer, and will be deallocated when flush or compaction is complete.

        Show
        Mikhail Bautin added a comment - The BAOS in question only exists during the lifetime of the writer, and will be deallocated when flush or compaction is complete.
        Hide
        Lars Hofhansl added a comment -

        I see, so it seems that there is nothing to worry about.

        Show
        Lars Hofhansl added a comment - I see, so it seems that there is nothing to worry about.
        Hide
        Phabricator added a comment -

        mbautin has commented on the revision "[jira] HBASE-5387 Reuse compression streams in HFileBlock.Writer".

        INLINE COMMENTS
        src/main/java/org/apache/hadoop/hbase/io/hfile/Compression.java:213 Done.
        src/main/java/org/apache/hadoop/hbase/io/hfile/ReusableStreamGzipCodec.java:45 Done.
        src/main/java/org/apache/hadoop/hbase/io/hfile/ReusableStreamGzipCodec.java:40 Done.
        src/main/java/org/apache/hadoop/hbase/io/hfile/ReusableStreamGzipCodec.java:48 Done.
        src/main/java/org/apache/hadoop/hbase/io/hfile/ReusableStreamGzipCodec.java:92 Done.
        src/main/java/org/apache/hadoop/hbase/io/hfile/ReusableStreamGzipCodec.java:75 Done.

        REVISION DETAIL
        https://reviews.facebook.net/D1719

        Show
        Phabricator added a comment - mbautin has commented on the revision " [jira] HBASE-5387 Reuse compression streams in HFileBlock.Writer". INLINE COMMENTS src/main/java/org/apache/hadoop/hbase/io/hfile/Compression.java:213 Done. src/main/java/org/apache/hadoop/hbase/io/hfile/ReusableStreamGzipCodec.java:45 Done. src/main/java/org/apache/hadoop/hbase/io/hfile/ReusableStreamGzipCodec.java:40 Done. src/main/java/org/apache/hadoop/hbase/io/hfile/ReusableStreamGzipCodec.java:48 Done. src/main/java/org/apache/hadoop/hbase/io/hfile/ReusableStreamGzipCodec.java:92 Done. src/main/java/org/apache/hadoop/hbase/io/hfile/ReusableStreamGzipCodec.java:75 Done. REVISION DETAIL https://reviews.facebook.net/D1719
        Hide
        Phabricator added a comment -

        mbautin updated the revision "[jira] HBASE-5387 Reuse compression streams in HFileBlock.Writer".
        Reviewers: tedyu, Liyin, dhruba, JIRA

        Addressing review comments.

        REVISION DETAIL
        https://reviews.facebook.net/D1719

        AFFECTED FILES
        src/main/java/org/apache/hadoop/hbase/io/hfile/Compression.java
        src/main/java/org/apache/hadoop/hbase/io/hfile/HFileBlock.java
        src/main/java/org/apache/hadoop/hbase/io/hfile/ReusableStreamGzipCodec.java
        src/test/java/org/apache/hadoop/hbase/io/hfile/TestHFileBlock.java

        Show
        Phabricator added a comment - mbautin updated the revision " [jira] HBASE-5387 Reuse compression streams in HFileBlock.Writer". Reviewers: tedyu, Liyin, dhruba, JIRA Addressing review comments. REVISION DETAIL https://reviews.facebook.net/D1719 AFFECTED FILES src/main/java/org/apache/hadoop/hbase/io/hfile/Compression.java src/main/java/org/apache/hadoop/hbase/io/hfile/HFileBlock.java src/main/java/org/apache/hadoop/hbase/io/hfile/ReusableStreamGzipCodec.java src/test/java/org/apache/hadoop/hbase/io/hfile/TestHFileBlock.java
        Phabricator made changes -
        Attachment D1719.2.patch [ 12514243 ]
        Hide
        Lars Hofhansl added a comment -

        @Mikhail: So we're still creating a new GzipCodec frequently (with every new HFileBlock object) albeit less frequently than before?

        Show
        Lars Hofhansl added a comment - @Mikhail: So we're still creating a new GzipCodec frequently (with every new HFileBlock object) albeit less frequently than before?
        Mikhail Bautin made changes -
        Hide
        Mikhail Bautin added a comment -

        @Lars: where are we creating a new GzipCodec frequently? We only instantiate ReusableStreamGzipCodec once in the following block:

        Compression.java
            GZ("gz") {
              private transient GzipCodec codec;
        
              @Override
              DefaultCodec getCodec(Configuration conf) {
                if (codec == null) {
                  codec = new ReusableStreamGzipCodec();
                  codec.setConf(new Configuration(conf));
                }
        
                return codec;
              }
            },
        

        What we are creating less frequently than before is the compressing output stream (a subclass of GZIPOutputStream with the associated native data structure): once per HFile writer with the patch vs. for every HFile block previously.

        Show
        Mikhail Bautin added a comment - @Lars: where are we creating a new GzipCodec frequently? We only instantiate ReusableStreamGzipCodec once in the following block: Compression.java GZ( "gz" ) { private transient GzipCodec codec; @Override DefaultCodec getCodec(Configuration conf) { if (codec == null ) { codec = new ReusableStreamGzipCodec(); codec.setConf( new Configuration(conf)); } return codec; } }, What we are creating less frequently than before is the compressing output stream (a subclass of GZIPOutputStream with the associated native data structure): once per HFile writer with the patch vs. for every HFile block previously.
        Hide
        Phabricator added a comment -

        mbautin updated the revision "[jira] HBASE-5387 Reuse compression streams in HFileBlock.Writer".
        Reviewers: tedyu, Liyin, dhruba, JIRA

        Breaking compression stream creation methods into multiple parts. Avoiding using "finish-on-flush" stream in a couple of places, and deprecating the currently used method that is always called with downstream buffer size of 0, at least within HBase code. The method is still there for backwards compatibility. Also, getting rid of hard-coded buffer size by delegating the native compressor case to superclass in ReusableStreamGzipCodec.

        REVISION DETAIL
        https://reviews.facebook.net/D1719

        AFFECTED FILES
        src/main/java/org/apache/hadoop/hbase/io/hfile/Compression.java
        src/main/java/org/apache/hadoop/hbase/io/hfile/HFileBlock.java
        src/main/java/org/apache/hadoop/hbase/io/hfile/HFileWriterV1.java
        src/main/java/org/apache/hadoop/hbase/io/hfile/ReusableStreamGzipCodec.java
        src/test/java/org/apache/hadoop/hbase/io/hfile/TestHFileBlock.java
        src/test/java/org/apache/hadoop/hbase/regionserver/DataBlockEncodingTool.java

        Show
        Phabricator added a comment - mbautin updated the revision " [jira] HBASE-5387 Reuse compression streams in HFileBlock.Writer". Reviewers: tedyu, Liyin, dhruba, JIRA Breaking compression stream creation methods into multiple parts. Avoiding using "finish-on-flush" stream in a couple of places, and deprecating the currently used method that is always called with downstream buffer size of 0, at least within HBase code. The method is still there for backwards compatibility. Also, getting rid of hard-coded buffer size by delegating the native compressor case to superclass in ReusableStreamGzipCodec. REVISION DETAIL https://reviews.facebook.net/D1719 AFFECTED FILES src/main/java/org/apache/hadoop/hbase/io/hfile/Compression.java src/main/java/org/apache/hadoop/hbase/io/hfile/HFileBlock.java src/main/java/org/apache/hadoop/hbase/io/hfile/HFileWriterV1.java src/main/java/org/apache/hadoop/hbase/io/hfile/ReusableStreamGzipCodec.java src/test/java/org/apache/hadoop/hbase/io/hfile/TestHFileBlock.java src/test/java/org/apache/hadoop/hbase/regionserver/DataBlockEncodingTool.java
        Phabricator made changes -
        Attachment D1719.3.patch [ 12514256 ]
        Hide
        Phabricator added a comment -

        lhofhansl has accepted the revision "[jira] HBASE-5387 Reuse compression streams in HFileBlock.Writer".

        Nice patch Mikhail.
        Two more nits inline, can be done on commit.

        INLINE COMMENTS
        src/main/java/org/apache/hadoop/hbase/io/hfile/Compression.java:216 Make an HConstant for "io.file.buffer.size" and/or add it to hbase-defaults.xml?
        src/main/java/org/apache/hadoop/hbase/io/hfile/HFileBlock.java:613 Since it confused many of us, maybe add a brief comment that is OK to keep the BAOS per Writer, because of the specific, limited lifetime of a writer object.

        REVISION DETAIL
        https://reviews.facebook.net/D1719

        Show
        Phabricator added a comment - lhofhansl has accepted the revision " [jira] HBASE-5387 Reuse compression streams in HFileBlock.Writer". Nice patch Mikhail. Two more nits inline, can be done on commit. INLINE COMMENTS src/main/java/org/apache/hadoop/hbase/io/hfile/Compression.java:216 Make an HConstant for "io.file.buffer.size" and/or add it to hbase-defaults.xml? src/main/java/org/apache/hadoop/hbase/io/hfile/HFileBlock.java:613 Since it confused many of us, maybe add a brief comment that is OK to keep the BAOS per Writer, because of the specific, limited lifetime of a writer object. REVISION DETAIL https://reviews.facebook.net/D1719
        Hide
        Phabricator added a comment -

        tedyu has commented on the revision "[jira] HBASE-5387 Reuse compression streams in HFileBlock.Writer".

        Looks good.

        INLINE COMMENTS
        src/test/java/org/apache/hadoop/hbase/regionserver/DataBlockEncodingTool.java:328 Should we document this additional call ?

        REVISION DETAIL
        https://reviews.facebook.net/D1719

        Show
        Phabricator added a comment - tedyu has commented on the revision " [jira] HBASE-5387 Reuse compression streams in HFileBlock.Writer". Looks good. INLINE COMMENTS src/test/java/org/apache/hadoop/hbase/regionserver/DataBlockEncodingTool.java:328 Should we document this additional call ? REVISION DETAIL https://reviews.facebook.net/D1719
        Hide
        Phabricator added a comment -

        mbautin has commented on the revision "[jira] HBASE-5387 Reuse compression streams in HFileBlock.Writer".

        Looking into a bug in the most recent version of the patch.

        REVISION DETAIL
        https://reviews.facebook.net/D1719

        Show
        Phabricator added a comment - mbautin has commented on the revision " [jira] HBASE-5387 Reuse compression streams in HFileBlock.Writer". Looking into a bug in the most recent version of the patch. REVISION DETAIL https://reviews.facebook.net/D1719
        Hide
        Phabricator added a comment -

        mbautin updated the revision "[jira] HBASE-5387 Reuse compression streams in HFileBlock.Writer".
        Reviewers: tedyu, Liyin, dhruba, JIRA, lhofhansl

        Fixing the bug in the previous version of the patch (broke some tests in case of no compression). createCompressionStream(OutputStream, Compressor, int) remains the primary API, and the new package-private method only used from HFileBlock is createPlainCompressionStream.

        REVISION DETAIL
        https://reviews.facebook.net/D1719

        AFFECTED FILES
        src/main/java/org/apache/hadoop/hbase/io/hfile/Compression.java
        src/main/java/org/apache/hadoop/hbase/io/hfile/HFileBlock.java
        src/main/java/org/apache/hadoop/hbase/io/hfile/ReusableStreamGzipCodec.java
        src/test/java/org/apache/hadoop/hbase/io/hfile/TestHFileBlock.java

        Show
        Phabricator added a comment - mbautin updated the revision " [jira] HBASE-5387 Reuse compression streams in HFileBlock.Writer". Reviewers: tedyu, Liyin, dhruba, JIRA, lhofhansl Fixing the bug in the previous version of the patch (broke some tests in case of no compression). createCompressionStream(OutputStream, Compressor, int) remains the primary API, and the new package-private method only used from HFileBlock is createPlainCompressionStream. REVISION DETAIL https://reviews.facebook.net/D1719 AFFECTED FILES src/main/java/org/apache/hadoop/hbase/io/hfile/Compression.java src/main/java/org/apache/hadoop/hbase/io/hfile/HFileBlock.java src/main/java/org/apache/hadoop/hbase/io/hfile/ReusableStreamGzipCodec.java src/test/java/org/apache/hadoop/hbase/io/hfile/TestHFileBlock.java
        Phabricator made changes -
        Attachment D1719.4.patch [ 12514257 ]
        Hide
        Phabricator added a comment -

        mbautin updated the revision "[jira] HBASE-5387 Reuse compression streams in HFileBlock.Writer".
        Reviewers: tedyu, Liyin, dhruba, JIRA, lhofhansl

        Forgot to actually use ReusableStreamGzipCodec.

        REVISION DETAIL
        https://reviews.facebook.net/D1719

        AFFECTED FILES
        src/main/java/org/apache/hadoop/hbase/io/hfile/Compression.java
        src/main/java/org/apache/hadoop/hbase/io/hfile/HFileBlock.java
        src/main/java/org/apache/hadoop/hbase/io/hfile/ReusableStreamGzipCodec.java
        src/test/java/org/apache/hadoop/hbase/io/hfile/TestHFileBlock.java

        Show
        Phabricator added a comment - mbautin updated the revision " [jira] HBASE-5387 Reuse compression streams in HFileBlock.Writer". Reviewers: tedyu, Liyin, dhruba, JIRA, lhofhansl Forgot to actually use ReusableStreamGzipCodec. REVISION DETAIL https://reviews.facebook.net/D1719 AFFECTED FILES src/main/java/org/apache/hadoop/hbase/io/hfile/Compression.java src/main/java/org/apache/hadoop/hbase/io/hfile/HFileBlock.java src/main/java/org/apache/hadoop/hbase/io/hfile/ReusableStreamGzipCodec.java src/test/java/org/apache/hadoop/hbase/io/hfile/TestHFileBlock.java
        Phabricator made changes -
        Attachment D1719.5.patch [ 12514258 ]
        Hide
        Ted Yu added a comment -

        Latest patch from Mikhail.

        Show
        Ted Yu added a comment - Latest patch from Mikhail.
        Ted Yu made changes -
        Attachment 5387.txt [ 12514260 ]
        Hide
        Mikhail Bautin added a comment -

        @Ted: thanks for attaching the patch. I will look into how to fix Phabricator to attach Jenkins-compatible patches.

        Show
        Mikhail Bautin added a comment - @Ted: thanks for attaching the patch. I will look into how to fix Phabricator to attach Jenkins-compatible patches.
        Mikhail Bautin made changes -
        Mikhail Bautin made changes -
        Status Patch Available [ 10002 ] Open [ 1 ]
        Mikhail Bautin made changes -
        Status Open [ 1 ] Patch Available [ 10002 ]
        Hide
        Ted Yu added a comment -

        The failure of Phabricator attaching patches might have something to do with the fact that Apache Jenkins has been down for roughly a whole day.

        Show
        Ted Yu added a comment - The failure of Phabricator attaching patches might have something to do with the fact that Apache Jenkins has been down for roughly a whole day.
        Hide
        Hadoop QA added a comment -

        -1 overall. Here are the results of testing the latest attachment
        http://issues.apache.org/jira/secure/attachment/12514262/Fix-deflater-leak-2012-02-12_00_37_27.patch
        against trunk revision .

        +1 @author. The patch does not contain any @author tags.

        +1 tests included. The patch appears to include 4 new or modified tests.

        -1 javadoc. The javadoc tool appears to have generated -136 warning messages.

        +1 javac. The applied patch does not increase the total number of javac compiler warnings.

        -1 findbugs. The patch appears to introduce 157 new Findbugs (version 1.3.9) warnings.

        +1 release audit. The applied patch does not increase the total number of release audit warnings.

        -1 core tests. The patch failed these unit tests:
        org.apache.hadoop.hbase.master.TestSplitLogManager
        org.apache.hadoop.hbase.mapreduce.TestHFileOutputFormat
        org.apache.hadoop.hbase.mapred.TestTableMapReduce
        org.apache.hadoop.hbase.coprocessor.TestClassLoading
        org.apache.hadoop.hbase.mapreduce.TestImportTsv

        Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/946//testReport/
        Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/946//artifact/trunk/patchprocess/newPatchFindbugsWarnings.html
        Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/946//console

        This message is automatically generated.

        Show
        Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12514262/Fix-deflater-leak-2012-02-12_00_37_27.patch against trunk revision . +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 4 new or modified tests. -1 javadoc. The javadoc tool appears to have generated -136 warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. -1 findbugs. The patch appears to introduce 157 new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed these unit tests: org.apache.hadoop.hbase.master.TestSplitLogManager org.apache.hadoop.hbase.mapreduce.TestHFileOutputFormat org.apache.hadoop.hbase.mapred.TestTableMapReduce org.apache.hadoop.hbase.coprocessor.TestClassLoading org.apache.hadoop.hbase.mapreduce.TestImportTsv Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/946//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/946//artifact/trunk/patchprocess/newPatchFindbugsWarnings.html Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/946//console This message is automatically generated.
        Hide
        Mikhail Bautin added a comment -

        Those reported test failures seem unrelated. Is this OK to commit?

        Show
        Mikhail Bautin added a comment - Those reported test failures seem unrelated. Is this OK to commit?
        Hide
        Ted Yu added a comment -

        I ran the following tests with latest patch and they passed:

          578  mt -Dtest=TestClassLoading
          579  mt -Dtest=TestSplitLogManager
          581  mt -Dtest=TestHFileOutputFormat
        

        The latest patch should be good to go.

        Show
        Ted Yu added a comment - I ran the following tests with latest patch and they passed: 578 mt -Dtest=TestClassLoading 579 mt -Dtest=TestSplitLogManager 581 mt -Dtest=TestHFileOutputFormat The latest patch should be good to go.
        Hide
        Lars Hofhansl added a comment -

        +1

        Show
        Lars Hofhansl added a comment - +1
        Hide
        ramkrishna.s.vasudevan added a comment -

        +1

        Show
        ramkrishna.s.vasudevan added a comment - +1
        Hide
        Ted Yu added a comment -

        @Mikhail:
        Can you integrate the patch so that we have a green TRUNK build ?

        Thanks

        Show
        Ted Yu added a comment - @Mikhail: Can you integrate the patch so that we have a green TRUNK build ? Thanks
        Hide
        Mikhail Bautin added a comment -

        @Ted: sure. I was rebasing and running a final local test run.

        Show
        Mikhail Bautin added a comment - @Ted: sure. I was rebasing and running a final local test run.
        Hide
        Ted Yu added a comment -

        Thanks for being prudent.

        Show
        Ted Yu added a comment - Thanks for being prudent.
        Hide
        Ted Yu added a comment -

        Looks like patch has been integrated.

        Please announce integration on the JIRA in the future.

        Show
        Ted Yu added a comment - Looks like patch has been integrated. Please announce integration on the JIRA in the future.
        Hide
        Hudson added a comment -

        Integrated in HBase-TRUNK #2661 (See https://builds.apache.org/job/HBase-TRUNK/2661/)
        [jira] HBASE-5387 Reuse compression streams in HFileBlock.Writer

        Summary: We need to to reuse compression streams in HFileBlock.Writer instead of
        allocating them every time. The motivation is that when using Java's built-in
        implementation of Gzip, we allocate a new GZIPOutputStream object and an
        associated native data structure any time. This is one suspected cause of recent
        TestHFileBlock failures on Hadoop QA:
        https://builds.apache.org/job/HBase-TRUNK/2658/testReport/org.apache.hadoop.hbase.io.hfile/TestHFileBlock/testPreviousOffset_1_/.

        Test Plan:

        • Run unit tests
        • Create a GZIP-compressed CF with new code, load some data, shut down HBase,
          deploy old code, restart HBase, and scan the table

        Reviewers: tedyu, Liyin, dhruba, JIRA, lhofhansl

        Reviewed By: lhofhansl

        CC: tedyu, lhofhansl, mbautin

        Differential Revision: https://reviews.facebook.net/D1719 (Revision 1243667)

        Result = SUCCESS
        mbautin :
        Files :

        • /hbase/trunk/src/main/java/org/apache/hadoop/hbase/io/hfile/Compression.java
        • /hbase/trunk/src/main/java/org/apache/hadoop/hbase/io/hfile/HFileBlock.java
        • /hbase/trunk/src/main/java/org/apache/hadoop/hbase/io/hfile/ReusableStreamGzipCodec.java
        • /hbase/trunk/src/test/java/org/apache/hadoop/hbase/io/hfile/TestHFileBlock.java
        Show
        Hudson added a comment - Integrated in HBase-TRUNK #2661 (See https://builds.apache.org/job/HBase-TRUNK/2661/ ) [jira] HBASE-5387 Reuse compression streams in HFileBlock.Writer Summary: We need to to reuse compression streams in HFileBlock.Writer instead of allocating them every time. The motivation is that when using Java's built-in implementation of Gzip, we allocate a new GZIPOutputStream object and an associated native data structure any time. This is one suspected cause of recent TestHFileBlock failures on Hadoop QA: https://builds.apache.org/job/HBase-TRUNK/2658/testReport/org.apache.hadoop.hbase.io.hfile/TestHFileBlock/testPreviousOffset_1_/ . Test Plan: Run unit tests Create a GZIP-compressed CF with new code, load some data, shut down HBase, deploy old code, restart HBase, and scan the table Reviewers: tedyu, Liyin, dhruba, JIRA, lhofhansl Reviewed By: lhofhansl CC: tedyu, lhofhansl, mbautin Differential Revision: https://reviews.facebook.net/D1719 (Revision 1243667) Result = SUCCESS mbautin : Files : /hbase/trunk/src/main/java/org/apache/hadoop/hbase/io/hfile/Compression.java /hbase/trunk/src/main/java/org/apache/hadoop/hbase/io/hfile/HFileBlock.java /hbase/trunk/src/main/java/org/apache/hadoop/hbase/io/hfile/ReusableStreamGzipCodec.java /hbase/trunk/src/test/java/org/apache/hadoop/hbase/io/hfile/TestHFileBlock.java
        Ted Yu made changes -
        Status Patch Available [ 10002 ] Resolved [ 5 ]
        Resolution Fixed [ 1 ]
        Hide
        Hudson added a comment -

        Integrated in HBase-TRUNK-security #110 (See https://builds.apache.org/job/HBase-TRUNK-security/110/)
        [jira] HBASE-5387 Reuse compression streams in HFileBlock.Writer

        Summary: We need to to reuse compression streams in HFileBlock.Writer instead of
        allocating them every time. The motivation is that when using Java's built-in
        implementation of Gzip, we allocate a new GZIPOutputStream object and an
        associated native data structure any time. This is one suspected cause of recent
        TestHFileBlock failures on Hadoop QA:
        https://builds.apache.org/job/HBase-TRUNK/2658/testReport/org.apache.hadoop.hbase.io.hfile/TestHFileBlock/testPreviousOffset_1_/.

        Test Plan:

        • Run unit tests
        • Create a GZIP-compressed CF with new code, load some data, shut down HBase,
          deploy old code, restart HBase, and scan the table

        Reviewers: tedyu, Liyin, dhruba, JIRA, lhofhansl

        Reviewed By: lhofhansl

        CC: tedyu, lhofhansl, mbautin

        Differential Revision: https://reviews.facebook.net/D1719 (Revision 1243667)

        Result = SUCCESS
        mbautin :
        Files :

        • /hbase/trunk/src/main/java/org/apache/hadoop/hbase/io/hfile/Compression.java
        • /hbase/trunk/src/main/java/org/apache/hadoop/hbase/io/hfile/HFileBlock.java
        • /hbase/trunk/src/main/java/org/apache/hadoop/hbase/io/hfile/ReusableStreamGzipCodec.java
        • /hbase/trunk/src/test/java/org/apache/hadoop/hbase/io/hfile/TestHFileBlock.java
        Show
        Hudson added a comment - Integrated in HBase-TRUNK-security #110 (See https://builds.apache.org/job/HBase-TRUNK-security/110/ ) [jira] HBASE-5387 Reuse compression streams in HFileBlock.Writer Summary: We need to to reuse compression streams in HFileBlock.Writer instead of allocating them every time. The motivation is that when using Java's built-in implementation of Gzip, we allocate a new GZIPOutputStream object and an associated native data structure any time. This is one suspected cause of recent TestHFileBlock failures on Hadoop QA: https://builds.apache.org/job/HBase-TRUNK/2658/testReport/org.apache.hadoop.hbase.io.hfile/TestHFileBlock/testPreviousOffset_1_/ . Test Plan: Run unit tests Create a GZIP-compressed CF with new code, load some data, shut down HBase, deploy old code, restart HBase, and scan the table Reviewers: tedyu, Liyin, dhruba, JIRA, lhofhansl Reviewed By: lhofhansl CC: tedyu, lhofhansl, mbautin Differential Revision: https://reviews.facebook.net/D1719 (Revision 1243667) Result = SUCCESS mbautin : Files : /hbase/trunk/src/main/java/org/apache/hadoop/hbase/io/hfile/Compression.java /hbase/trunk/src/main/java/org/apache/hadoop/hbase/io/hfile/HFileBlock.java /hbase/trunk/src/main/java/org/apache/hadoop/hbase/io/hfile/ReusableStreamGzipCodec.java /hbase/trunk/src/test/java/org/apache/hadoop/hbase/io/hfile/TestHFileBlock.java
        Hide
        Phabricator added a comment -

        mbautin requested code review of "[jira] HBASE-5387 [89-fb] Reuse compression streams in HFileBlock.Writer".
        Reviewers: khemani, Liyin, Kannan, lhofhansl, tedyu, JIRA

        Porting the trunk patch https://reviews.facebook.net/D1719 to 89-fb to make Gzip compression streams reusable when no Hadoop native libraries loaded. Also, making the Compression class and HFileBlock in 89-fb more similar to its trunk version. This touches the same code in HFileBlock with the data block encoding patch (https://reviews.facebook.net/D1659), which will have to be rebased on this.

        TEST PLAN
        Run unit tests

        REVISION DETAIL
        https://reviews.facebook.net/D1725

        AFFECTED FILES
        src/main/java/org/apache/hadoop/hbase/io/hfile/Compression.java
        src/main/java/org/apache/hadoop/hbase/io/hfile/HFileBlock.java
        src/main/java/org/apache/hadoop/hbase/io/hfile/HFileBlockIndex.java
        src/main/java/org/apache/hadoop/hbase/io/hfile/HFileWriterV2.java
        src/main/java/org/apache/hadoop/hbase/io/hfile/ReusableStreamGzipCodec.java
        src/test/java/org/apache/hadoop/hbase/io/hfile/TestHFileBlock.java
        src/test/java/org/apache/hadoop/hbase/io/hfile/TestHFileBlockIndex.java

        MANAGE HERALD DIFFERENTIAL RULES
        https://reviews.facebook.net/herald/view/differential/

        WHY DID I GET THIS EMAIL?
        https://reviews.facebook.net/herald/transcript/3681/

        Tip: use the X-Herald-Rules header to filter Herald messages in your client.

        Show
        Phabricator added a comment - mbautin requested code review of " [jira] HBASE-5387 [89-fb] Reuse compression streams in HFileBlock.Writer". Reviewers: khemani, Liyin, Kannan, lhofhansl, tedyu, JIRA Porting the trunk patch https://reviews.facebook.net/D1719 to 89-fb to make Gzip compression streams reusable when no Hadoop native libraries loaded. Also, making the Compression class and HFileBlock in 89-fb more similar to its trunk version. This touches the same code in HFileBlock with the data block encoding patch ( https://reviews.facebook.net/D1659 ), which will have to be rebased on this. TEST PLAN Run unit tests REVISION DETAIL https://reviews.facebook.net/D1725 AFFECTED FILES src/main/java/org/apache/hadoop/hbase/io/hfile/Compression.java src/main/java/org/apache/hadoop/hbase/io/hfile/HFileBlock.java src/main/java/org/apache/hadoop/hbase/io/hfile/HFileBlockIndex.java src/main/java/org/apache/hadoop/hbase/io/hfile/HFileWriterV2.java src/main/java/org/apache/hadoop/hbase/io/hfile/ReusableStreamGzipCodec.java src/test/java/org/apache/hadoop/hbase/io/hfile/TestHFileBlock.java src/test/java/org/apache/hadoop/hbase/io/hfile/TestHFileBlockIndex.java MANAGE HERALD DIFFERENTIAL RULES https://reviews.facebook.net/herald/view/differential/ WHY DID I GET THIS EMAIL? https://reviews.facebook.net/herald/transcript/3681/ Tip: use the X-Herald-Rules header to filter Herald messages in your client.
        Phabricator made changes -
        Attachment D1725.1.patch [ 12514422 ]
        Hide
        Phabricator added a comment -

        mbautin has committed the revision "[jira] HBASE-5387 Reuse compression streams in HFileBlock.Writer".

        REVISION DETAIL
        https://reviews.facebook.net/D1719

        COMMIT
        https://reviews.facebook.net/rHBASE1243667

        Show
        Phabricator added a comment - mbautin has committed the revision " [jira] HBASE-5387 Reuse compression streams in HFileBlock.Writer". REVISION DETAIL https://reviews.facebook.net/D1719 COMMIT https://reviews.facebook.net/rHBASE1243667
        Hide
        ramkrishna.s.vasudevan added a comment -

        We are facing the same problem in the 0.90 also when using GZIP. Below is a snapshot of the 'top' command

        13236 root      20   0 21.9g  21g  12m S    1 68.4 450:56.37 /opt/nn/jdk1.6.0_22
        13236 root      20   0 21.9g  21g  12m S    1 68.4 450:56.71 /opt/nn/jdk1.6.0_22
        13236 root      20   0 21.9g  21g  12m S    1 68.4 450:57.06 /opt/nn/jdk1.6.0_22
        

        Configured heap is 4G for RS.

        Show
        ramkrishna.s.vasudevan added a comment - We are facing the same problem in the 0.90 also when using GZIP. Below is a snapshot of the 'top' command 13236 root 20 0 21.9g 21g 12m S 1 68.4 450:56.37 /opt/nn/jdk1.6.0_22 13236 root 20 0 21.9g 21g 12m S 1 68.4 450:56.71 /opt/nn/jdk1.6.0_22 13236 root 20 0 21.9g 21g 12m S 1 68.4 450:57.06 /opt/nn/jdk1.6.0_22 Configured heap is 4G for RS.
        Hide
        Phabricator added a comment -

        mbautin updated the revision "[jira] HBASE-5387 [89-fb] Reuse compression streams in HFileBlock.Writer".
        Reviewers: khemani, Liyin, Kannan, lhofhansl, tedyu, JIRA

        Rebasing on recent changes in 89-fb (data block encoding).

        REVISION DETAIL
        https://reviews.facebook.net/D1725

        AFFECTED FILES
        src/main/java/org/apache/hadoop/hbase/io/hfile/Compression.java
        src/main/java/org/apache/hadoop/hbase/io/hfile/HFileBlock.java
        src/main/java/org/apache/hadoop/hbase/io/hfile/ReusableStreamGzipCodec.java
        src/test/java/org/apache/hadoop/hbase/io/hfile/TestHFileBlock.java

        Show
        Phabricator added a comment - mbautin updated the revision " [jira] HBASE-5387 [89-fb] Reuse compression streams in HFileBlock.Writer". Reviewers: khemani, Liyin, Kannan, lhofhansl, tedyu, JIRA Rebasing on recent changes in 89-fb (data block encoding). REVISION DETAIL https://reviews.facebook.net/D1725 AFFECTED FILES src/main/java/org/apache/hadoop/hbase/io/hfile/Compression.java src/main/java/org/apache/hadoop/hbase/io/hfile/HFileBlock.java src/main/java/org/apache/hadoop/hbase/io/hfile/ReusableStreamGzipCodec.java src/test/java/org/apache/hadoop/hbase/io/hfile/TestHFileBlock.java
        Phabricator made changes -
        Attachment D1725.2.patch [ 12515506 ]
        Hide
        Phabricator added a comment -

        Karthik has commented on the revision "[jira] HBASE-5387 [89-fb] Reuse compression streams in HFileBlock.Writer".

        Looks good! I think I might end up using ReusableStreamGzipCodec also in the compressed RPC change.

        REVISION DETAIL
        https://reviews.facebook.net/D1725

        Show
        Phabricator added a comment - Karthik has commented on the revision " [jira] HBASE-5387 [89-fb] Reuse compression streams in HFileBlock.Writer". Looks good! I think I might end up using ReusableStreamGzipCodec also in the compressed RPC change. REVISION DETAIL https://reviews.facebook.net/D1725
        Hide
        Phabricator added a comment -

        Karthik has accepted the revision "[jira] HBASE-5387 [89-fb] Reuse compression streams in HFileBlock.Writer".

        REVISION DETAIL
        https://reviews.facebook.net/D1725

        BRANCH
        fix_gzip5

        Show
        Phabricator added a comment - Karthik has accepted the revision " [jira] HBASE-5387 [89-fb] Reuse compression streams in HFileBlock.Writer". REVISION DETAIL https://reviews.facebook.net/D1725 BRANCH fix_gzip5
        Hide
        Phabricator added a comment -

        mbautin has committed the revision "[jira] HBASE-5387 [89-fb] Reuse compression streams in HFileBlock.Writer".

        REVISION DETAIL
        https://reviews.facebook.net/D1725

        COMMIT
        https://reviews.facebook.net/rHBASEEIGHTNINEFBBRANCH1292434

        Show
        Phabricator added a comment - mbautin has committed the revision " [jira] HBASE-5387 [89-fb] Reuse compression streams in HFileBlock.Writer". REVISION DETAIL https://reviews.facebook.net/D1725 COMMIT https://reviews.facebook.net/rHBASEEIGHTNINEFBBRANCH1292434
        Lars Hofhansl made changes -
        Status Resolved [ 5 ] Closed [ 6 ]
        Transition Time In Source Status Execution Times Last Executer Last Execution Date
        Patch Available Patch Available Open Open
        1d 5h 34m 1 Mikhail Bautin 12/Feb/12 08:37
        Open Open Patch Available Patch Available
        18m 2 Mikhail Bautin 12/Feb/12 08:38
        Patch Available Patch Available Resolved Resolved
        1d 13h 46m 1 Ted Yu 13/Feb/12 22:24
        Resolved Resolved Closed Closed
        241d 7h 10m 1 Lars Hofhansl 12/Oct/12 06:35

          People

          • Assignee:
            Mikhail Bautin
            Reporter:
            Mikhail Bautin
          • Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development