HBase
  1. HBase
  2. HBASE-5458

Thread safety issues with Compression.Algorithm.GZ and CompressionTest

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Minor Minor
    • Resolution: Fixed
    • Affects Version/s: 0.90.5, 0.92.2, 0.94.4, 0.95.2
    • Fix Version/s: 0.94.5, 0.95.0
    • Component/s: io
    • Labels:
      None
    • Hadoop Flags:
      Reviewed

      Description

      I've seen some occasional NullPointerExceptions in ZlibFactory.isNativeZlibLoaded(conf) during region server startups and the completebulkload process. This is being caused by a null configuration getting passed to the isNativeZlibLoaded method. I think this happens when 2 or more threads call the CompressionTest.testCompression method at once. If the GZ algorithm has not been tested yet both threads could continue on and attempt to load the compressor. For GZ the getCodec method is not thread safe which could lead to one thread getting a reference to a GzipCodec that has a null configuration.

      current:
            DefaultCodec getCodec(Configuration conf) {
              if (codec == null) {
                codec = new GzipCodec();
                codec.setConf(new Configuration(conf));
              }
      
              return codec;
            }
      

      one possible fix would be something like this:

            DefaultCodec getCodec(Configuration conf) {
              if (codec == null) {
                GzipCodec gzip = new GzipCodec();
                gzip.setConf(new Configuration(conf));
                codec = gzip;
              }
      
              return codec;
            }
      

      But that may not be totally safe without some synchronization. An upstream fix in CompressionTest could also prevent multi thread access to GZ.getCodec(conf)

      exceptions:
      12/02/21 16:11:56 ERROR handler.OpenRegionHandler: Failed open of region=all-monthly,,1326263896983.bf574519a95263ec23a2bad9f5b8cbf4.
      java.io.IOException: java.lang.NullPointerException
      at org.apache.hadoop.hbase.util.CompressionTest.testCompression(CompressionTest.java:89)
      at org.apache.hadoop.hbase.regionserver.HRegion.checkCompressionCodecs(HRegion.java:2670)
      at org.apache.hadoop.hbase.regionserver.HRegion.openHRegion(HRegion.java:2659)
      at org.apache.hadoop.hbase.regionserver.HRegion.openHRegion(HRegion.java:2647)
      at org.apache.hadoop.hbase.regionserver.handler.OpenRegionHandler.openRegion(OpenRegionHandler.java:312)
      at org.apache.hadoop.hbase.regionserver.handler.OpenRegionHandler.process(OpenRegionHandler.java:99)
      at org.apache.hadoop.hbase.executor.EventHandler.run(EventHandler.java:158)
      at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
      at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
      at java.lang.Thread.run(Thread.java:662)
      Caused by: java.lang.NullPointerException
      at org.apache.hadoop.io.compress.zlib.ZlibFactory.isNativeZlibLoaded(ZlibFactory.java:63)
      at org.apache.hadoop.io.compress.GzipCodec.getCompressorType(GzipCodec.java:166)
      at org.apache.hadoop.io.compress.CodecPool.getCompressor(CodecPool.java:100)
      at org.apache.hadoop.io.compress.CodecPool.getCompressor(CodecPool.java:112)
      at org.apache.hadoop.hbase.io.hfile.Compression$Algorithm.getCompressor(Compression.java:236)
      at org.apache.hadoop.hbase.util.CompressionTest.testCompression(CompressionTest.java:84)
      ... 9 more

      Caused by: java.io.IOException: java.lang.NullPointerException
      at org.apache.hadoop.hbase.util.CompressionTest.testCompression(CompressionTest.java:89)
      at org.apache.hadoop.hbase.io.hfile.HFile$Reader.readTrailer(HFile.java:890)
      at org.apache.hadoop.hbase.io.hfile.HFile$Reader.loadFileInfo(HFile.java:819)
      at org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles.groupOrSplit(LoadIncrementalHFiles.java:405)
      at org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles$2.call(LoadIncrementalHFiles.java:323)
      at org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles$2.call(LoadIncrementalHFiles.java:321)
      at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
      at java.util.concurrent.FutureTask.run(FutureTask.java:138)
      at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
      at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
      at java.lang.Thread.run(Thread.java:662)
      Caused by: java.lang.NullPointerException
      at org.apache.hadoop.io.compress.zlib.ZlibFactory.isNativeZlibLoaded(ZlibFactory.java:63)
      at org.apache.hadoop.io.compress.GzipCodec.getCompressorType(GzipCodec.java:166)
      at org.apache.hadoop.io.compress.CodecPool.getCompressor(CodecPool.java:100)
      at org.apache.hadoop.io.compress.CodecPool.getCompressor(CodecPool.java:112)
      at org.apache.hadoop.hbase.io.hfile.Compression$Algorithm.getCompressor(Compression.java:236)
      at org.apache.hadoop.hbase.util.CompressionTest.testCompression(CompressionTest.java:84)
      ... 10 more

      1. HBASE-5458-090-0.patch
        4 kB
        Elliott Clark
      2. HBASE-5458-090-1.patch
        4 kB
        Elliott Clark
      3. HBASE-5458-090-2.patch
        4 kB
        Elliott Clark
      4. HBASE-5458-092-2.patch
        5 kB
        Elliott Clark
      5. HBASE-5458-094-2.patch
        6 kB
        Elliott Clark
      6. HBASE-5458-trunk-2.patch
        6 kB
        Elliott Clark

        Activity

        Lars Hofhansl made changes -
        Fix Version/s 0.94.5 [ 12323874 ]
        stack made changes -
        Fix Version/s 0.95.0 [ 12324094 ]
        Fix Version/s 0.90.7 [ 12319481 ]
        Fix Version/s 0.96.0 [ 12320040 ]
        Fix Version/s 0.92.3 [ 12321692 ]
        Fix Version/s 0.94.5 [ 12323874 ]
        Lars Hofhansl made changes -
        Status Resolved [ 5 ] Closed [ 6 ]
        Hide
        Hudson added a comment -

        Integrated in HBase-0.92-security #148 (See https://builds.apache.org/job/HBase-0.92-security/148/)
        HBASE-5458 Thread safety issues with Compression.Algorithm.GZ and CompressionTest (Revision 1435318)

        Result = ABORTED
        eclark :
        Files :

        • /hbase/branches/0.92/src/main/java/org/apache/hadoop/hbase/io/hfile/Compression.java
        Show
        Hudson added a comment - Integrated in HBase-0.92-security #148 (See https://builds.apache.org/job/HBase-0.92-security/148/ ) HBASE-5458 Thread safety issues with Compression.Algorithm.GZ and CompressionTest (Revision 1435318) Result = ABORTED eclark : Files : /hbase/branches/0.92/src/main/java/org/apache/hadoop/hbase/io/hfile/Compression.java
        Hide
        Hudson added a comment -

        Integrated in HBase-0.94-security-on-Hadoop-23 #11 (See https://builds.apache.org/job/HBase-0.94-security-on-Hadoop-23/11/)
        HBASE-5458 Thread safety issues with Compression.Algorithm.GZ and CompressionTest (Revision 1435317)

        Result = FAILURE
        eclark :
        Files :

        • /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/io/hfile/Compression.java
        Show
        Hudson added a comment - Integrated in HBase-0.94-security-on-Hadoop-23 #11 (See https://builds.apache.org/job/HBase-0.94-security-on-Hadoop-23/11/ ) HBASE-5458 Thread safety issues with Compression.Algorithm.GZ and CompressionTest (Revision 1435317) Result = FAILURE eclark : Files : /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/io/hfile/Compression.java
        Hide
        Hudson added a comment -

        Integrated in HBase-0.94-security #96 (See https://builds.apache.org/job/HBase-0.94-security/96/)
        HBASE-5458 Thread safety issues with Compression.Algorithm.GZ and CompressionTest (Revision 1435317)

        Result = FAILURE
        eclark :
        Files :

        • /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/io/hfile/Compression.java
        Show
        Hudson added a comment - Integrated in HBase-0.94-security #96 (See https://builds.apache.org/job/HBase-0.94-security/96/ ) HBASE-5458 Thread safety issues with Compression.Algorithm.GZ and CompressionTest (Revision 1435317) Result = FAILURE eclark : Files : /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/io/hfile/Compression.java
        Hide
        Hudson added a comment -

        Integrated in HBase-TRUNK-on-Hadoop-2.0.0 #360 (See https://builds.apache.org/job/HBase-TRUNK-on-Hadoop-2.0.0/360/)
        HBASE-5458 Thread safety issues with Compression.Algorithm.GZ and CompressionTest (Revision 1435316)

        Result = FAILURE
        eclark :
        Files :

        • /hbase/trunk/hbase-common/src/main/java/org/apache/hadoop/hbase/io/compress/Compression.java
        Show
        Hudson added a comment - Integrated in HBase-TRUNK-on-Hadoop-2.0.0 #360 (See https://builds.apache.org/job/HBase-TRUNK-on-Hadoop-2.0.0/360/ ) HBASE-5458 Thread safety issues with Compression.Algorithm.GZ and CompressionTest (Revision 1435316) Result = FAILURE eclark : Files : /hbase/trunk/hbase-common/src/main/java/org/apache/hadoop/hbase/io/compress/Compression.java
        Hide
        Hudson added a comment -

        Integrated in HBase-0.92 #611 (See https://builds.apache.org/job/HBase-0.92/611/)
        HBASE-5458 Thread safety issues with Compression.Algorithm.GZ and CompressionTest (Revision 1435318)

        Result = FAILURE
        eclark :
        Files :

        • /hbase/branches/0.92/src/main/java/org/apache/hadoop/hbase/io/hfile/Compression.java
        Show
        Hudson added a comment - Integrated in HBase-0.92 #611 (See https://builds.apache.org/job/HBase-0.92/611/ ) HBASE-5458 Thread safety issues with Compression.Algorithm.GZ and CompressionTest (Revision 1435318) Result = FAILURE eclark : Files : /hbase/branches/0.92/src/main/java/org/apache/hadoop/hbase/io/hfile/Compression.java
        Hide
        Hudson added a comment -

        Integrated in HBase-0.94 #746 (See https://builds.apache.org/job/HBase-0.94/746/)
        HBASE-5458 Thread safety issues with Compression.Algorithm.GZ and CompressionTest (Revision 1435317)

        Result = SUCCESS
        eclark :
        Files :

        • /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/io/hfile/Compression.java
        Show
        Hudson added a comment - Integrated in HBase-0.94 #746 (See https://builds.apache.org/job/HBase-0.94/746/ ) HBASE-5458 Thread safety issues with Compression.Algorithm.GZ and CompressionTest (Revision 1435317) Result = SUCCESS eclark : Files : /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/io/hfile/Compression.java
        Hide
        Hudson added a comment -

        Integrated in HBase-TRUNK #3766 (See https://builds.apache.org/job/HBase-TRUNK/3766/)
        HBASE-5458 Thread safety issues with Compression.Algorithm.GZ and CompressionTest (Revision 1435316)

        Result = FAILURE
        eclark :
        Files :

        • /hbase/trunk/hbase-common/src/main/java/org/apache/hadoop/hbase/io/compress/Compression.java
        Show
        Hudson added a comment - Integrated in HBase-TRUNK #3766 (See https://builds.apache.org/job/HBase-TRUNK/3766/ ) HBASE-5458 Thread safety issues with Compression.Algorithm.GZ and CompressionTest (Revision 1435316) Result = FAILURE eclark : Files : /hbase/trunk/hbase-common/src/main/java/org/apache/hadoop/hbase/io/compress/Compression.java
        Elliott Clark made changes -
        Status Patch Available [ 10002 ] Resolved [ 5 ]
        Hadoop Flags Reviewed [ 10343 ]
        Fix Version/s 0.90.7 [ 12319481 ]
        Fix Version/s 0.92.3 [ 12321692 ]
        Fix Version/s 0.96.0 [ 12320040 ]
        Fix Version/s 0.94.5 [ 12323874 ]
        Resolution Fixed [ 1 ]
        Hide
        Elliott Clark added a comment -

        Trunk: Committed revision 1435316.
        94: Committed revision 1435317.
        92: Committed revision 1435318.
        90: Committed revision 1435319.

        Show
        Elliott Clark added a comment - Trunk: Committed revision 1435316. 94: Committed revision 1435317. 92: Committed revision 1435318. 90: Committed revision 1435319.
        Hide
        Lars Hofhansl added a comment -

        +1

        Show
        Lars Hofhansl added a comment - +1
        Hide
        Elliott Clark added a comment -

        going ahead to commit then.

        Show
        Elliott Clark added a comment - going ahead to commit then.
        Hide
        Elliott Clark added a comment -

        Yeah you're correct. every block not every file.

        Show
        Elliott Clark added a comment - Yeah you're correct. every block not every file.
        Hide
        Lars Hofhansl added a comment -

        AbstractFSReader.decompress is called for each block read, no? Anyway, still not super high volume.

        Show
        Lars Hofhansl added a comment - AbstractFSReader.decompress is called for each block read, no? Anyway, still not super high volume.
        Hide
        Elliott Clark added a comment - - edited

        This is called on every hfile reader open (assuming compressed hfiles), so pretty low volume, but not never.

        Edit: if compressed.

        Show
        Elliott Clark added a comment - - edited This is called on every hfile reader open (assuming compressed hfiles), so pretty low volume, but not never. Edit: if compressed.
        Hide
        Lars Hofhansl added a comment - - edited

        Looks good to me.
        This is not on a hot code path, right? The hottest I found was AbstractFSReader.decompress.

        Even volatiles create memory barriers in many cases, so we have to be careful.

        Show
        Lars Hofhansl added a comment - - edited Looks good to me. This is not on a hot code path, right? The hottest I found was AbstractFSReader.decompress. Even volatiles create memory barriers in many cases, so we have to be careful.
        Hide
        Elliott Clark added a comment -

        Lars Hofhansl Thoughts on this in 0.94 ?

        Show
        Elliott Clark added a comment - Lars Hofhansl Thoughts on this in 0.94 ?
        Hide
        Ted Yu added a comment -

        +1 on latest patch.

        Show
        Ted Yu added a comment - +1 on latest patch.
        Hide
        Hadoop QA added a comment -

        -1 overall. Here are the results of testing the latest attachment
        http://issues.apache.org/jira/secure/attachment/12565398/HBASE-5458-trunk-2.patch
        against trunk revision .

        +1 @author. The patch does not contain any @author tags.

        -1 tests included. The patch doesn't appear to include any new or modified tests.
        Please justify why no new tests are needed for this patch.
        Also please list what manual steps were performed to verify this patch.

        +1 hadoop2.0. The patch compiles against the hadoop 2.0 profile.

        +1 javadoc. The javadoc tool did not generate any warning messages.

        +1 javac. The applied patch does not increase the total number of javac compiler warnings.

        +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings.

        +1 release audit. The applied patch does not increase the total number of release audit warnings.

        +1 lineLengths. The patch does not introduce lines longer than 100

        -1 core tests. The patch failed these unit tests:
        org.apache.hadoop.hbase.TestLocalHBaseCluster

        -1 core zombie tests. There are 1 zombie test(s):

        Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/4079//testReport/
        Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/4079//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-protocol.html
        Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/4079//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop2-compat.html
        Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/4079//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-examples.html
        Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/4079//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop1-compat.html
        Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/4079//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-common.html
        Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/4079//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-server.html
        Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/4079//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html
        Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/4079//console

        This message is automatically generated.

        Show
        Hadoop QA added a comment - -1 overall . Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12565398/HBASE-5458-trunk-2.patch against trunk revision . +1 @author . The patch does not contain any @author tags. -1 tests included . The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. +1 hadoop2.0 . The patch compiles against the hadoop 2.0 profile. +1 javadoc . The javadoc tool did not generate any warning messages. +1 javac . The applied patch does not increase the total number of javac compiler warnings. +1 findbugs . The patch does not introduce any new Findbugs (version 1.3.9) warnings. +1 release audit . The applied patch does not increase the total number of release audit warnings. +1 lineLengths . The patch does not introduce lines longer than 100 -1 core tests . The patch failed these unit tests: org.apache.hadoop.hbase.TestLocalHBaseCluster -1 core zombie tests . There are 1 zombie test(s): Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/4079//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/4079//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-protocol.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/4079//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop2-compat.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/4079//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-examples.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/4079//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop1-compat.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/4079//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-common.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/4079//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-server.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/4079//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/4079//console This message is automatically generated.
        Elliott Clark made changes -
        Attachment HBASE-5458-trunk-2.patch [ 12565398 ]
        Elliott Clark made changes -
        Attachment HBASE-5458-094-2.patch [ 12565396 ]
        Hide
        Elliott Clark added a comment -

        0.94 and trunk used a new ReusableStreamGzipCodec which I missed when doing the port. Posting new patch.

        Show
        Elliott Clark added a comment - 0.94 and trunk used a new ReusableStreamGzipCodec which I missed when doing the port. Posting new patch.
        Hide
        Hadoop QA added a comment -

        -1 overall. Here are the results of testing the latest attachment
        http://issues.apache.org/jira/secure/attachment/12565390/HBASE-5458-092-2.patch
        against trunk revision .

        +1 @author. The patch does not contain any @author tags.

        -1 tests included. The patch doesn't appear to include any new or modified tests.
        Please justify why no new tests are needed for this patch.
        Also please list what manual steps were performed to verify this patch.

        -1 patch. The patch command could not apply the patch.

        Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/4078//console

        This message is automatically generated.

        Show
        Hadoop QA added a comment - -1 overall . Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12565390/HBASE-5458-092-2.patch against trunk revision . +1 @author . The patch does not contain any @author tags. -1 tests included . The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. -1 patch . The patch command could not apply the patch. Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/4078//console This message is automatically generated.
        Elliott Clark made changes -
        Attachment HBASE-5458-094-2.patch [ 12565389 ]
        Elliott Clark made changes -
        Attachment HBASE-5458-trunk-2.patch [ 12565392 ]
        Hide
        Hadoop QA added a comment -

        -1 overall. Here are the results of testing the latest attachment
        http://issues.apache.org/jira/secure/attachment/12565392/HBASE-5458-trunk-2.patch
        against trunk revision .

        +1 @author. The patch does not contain any @author tags.

        -1 tests included. The patch doesn't appear to include any new or modified tests.
        Please justify why no new tests are needed for this patch.
        Also please list what manual steps were performed to verify this patch.

        +1 hadoop2.0. The patch compiles against the hadoop 2.0 profile.

        +1 javadoc. The javadoc tool did not generate any warning messages.

        +1 javac. The applied patch does not increase the total number of javac compiler warnings.

        +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings.

        +1 release audit. The applied patch does not increase the total number of release audit warnings.

        +1 lineLengths. The patch does not introduce lines longer than 100

        -1 core tests. The patch failed these unit tests:
        org.apache.hadoop.hbase.regionserver.TestScanWithBloomError
        org.apache.hadoop.hbase.io.hfile.TestHFile
        org.apache.hadoop.hbase.io.hfile.TestHFileBlockCompatibility
        org.apache.hadoop.hbase.io.hfile.TestChecksum
        org.apache.hadoop.hbase.io.hfile.TestHFileWriterV2

        Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/4077//testReport/
        Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/4077//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-protocol.html
        Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/4077//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop2-compat.html
        Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/4077//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-examples.html
        Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/4077//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop1-compat.html
        Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/4077//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-common.html
        Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/4077//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-server.html
        Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/4077//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html
        Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/4077//console

        This message is automatically generated.

        Show
        Hadoop QA added a comment - -1 overall . Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12565392/HBASE-5458-trunk-2.patch against trunk revision . +1 @author . The patch does not contain any @author tags. -1 tests included . The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. +1 hadoop2.0 . The patch compiles against the hadoop 2.0 profile. +1 javadoc . The javadoc tool did not generate any warning messages. +1 javac . The applied patch does not increase the total number of javac compiler warnings. +1 findbugs . The patch does not introduce any new Findbugs (version 1.3.9) warnings. +1 release audit . The applied patch does not increase the total number of release audit warnings. +1 lineLengths . The patch does not introduce lines longer than 100 -1 core tests . The patch failed these unit tests: org.apache.hadoop.hbase.regionserver.TestScanWithBloomError org.apache.hadoop.hbase.io.hfile.TestHFile org.apache.hadoop.hbase.io.hfile.TestHFileBlockCompatibility org.apache.hadoop.hbase.io.hfile.TestChecksum org.apache.hadoop.hbase.io.hfile.TestHFileWriterV2 Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/4077//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/4077//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-protocol.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/4077//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop2-compat.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/4077//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-examples.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/4077//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop1-compat.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/4077//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-common.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/4077//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-server.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/4077//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/4077//console This message is automatically generated.
        Elliott Clark made changes -
        Status Open [ 1 ] Patch Available [ 10002 ]
        Affects Version/s 0.94.4 [ 12323367 ]
        Affects Version/s 0.92.2 [ 12319888 ]
        Affects Version/s 0.96.0 [ 12320040 ]
        Elliott Clark made changes -
        Attachment HBASE-5458-trunk-2.patch [ 12565392 ]
        Hide
        Elliott Clark added a comment -

        Trunk patch

        Show
        Elliott Clark added a comment - Trunk patch
        Elliott Clark made changes -
        Attachment HBASE-5458-094-2.patch [ 12565389 ]
        Attachment HBASE-5458-092-2.patch [ 12565390 ]
        Hide
        Elliott Clark added a comment -

        The non-trunk patches. Trunk patch coming in a few.

        Show
        Elliott Clark added a comment - The non-trunk patches. Trunk patch coming in a few.
        Hide
        Jonathan Hsieh added a comment -

        Got it, thanks for the clarification (I was looking at the later versions).

        Show
        Jonathan Hsieh added a comment - Got it, thanks for the clarification (I was looking at the later versions).
        Hide
        Elliott Clark added a comment -

        Ted Yu Sure I'll post a version for all the versions.

        Jonathan Hsieh The order was wrong on version 0, and it would be possible (not easy but still possible) for a thread to get a partially initialized Gzip algorithm. Fixed since then.

        Show
        Elliott Clark added a comment - Ted Yu Sure I'll post a version for all the versions. Jonathan Hsieh The order was wrong on version 0, and it would be possible (not easy but still possible) for a thread to get a partially initialized Gzip algorithm. Fixed since then.
        Hide
        Jonathan Hsieh added a comment -

        Elliott Clark lgtm +1.

        David McIntosh it looks good to me – this idiom is called double-checked locking and is an optimization for lazy initializers. In your example let's say thread 1 is already in buildCodec. This means it has the lock. thread 2 could see null, and would get blocked by the lock. thread 1 would finish initialization, set the variable, and then release the lock. thread 2 enters, sees that the variable is not null and falls through. I don't see where the conf object gets passed elsewhere. Am I missing something?

        Show
        Jonathan Hsieh added a comment - Elliott Clark lgtm +1. David McIntosh it looks good to me – this idiom is called double-checked locking and is an optimization for lazy initializers. In your example let's say thread 1 is already in buildCodec. This means it has the lock. thread 2 could see null, and would get blocked by the lock. thread 1 would finish initialization, set the variable, and then release the lock. thread 2 enters, sees that the variable is not null and falls through. I don't see where the conf object gets passed elsewhere. Am I missing something?
        Hide
        Ted Yu added a comment -

        Latest patch looks good.

        Can you prepare patch for trunk ?

        Thanks

        Show
        Ted Yu added a comment - Latest patch looks good. Can you prepare patch for trunk ? Thanks
        Elliott Clark made changes -
        Attachment HBASE-5458-090-2.patch [ 12565353 ]
        Hide
        Elliott Clark added a comment -

        Wrap the long line that Ted saw.

        Show
        Elliott Clark added a comment - Wrap the long line that Ted saw.
        Hide
        Ted Yu added a comment -
        +              ClassLoader.getSystemClassLoader().loadClass("org.apache.hadoop.io.compress.SnappyCodec");
        

        Mind wrapping long line ?

        Show
        Ted Yu added a comment - + ClassLoader .getSystemClassLoader().loadClass( "org.apache.hadoop.io.compress.SnappyCodec" ); Mind wrapping long line ?
        Elliott Clark made changes -
        Attachment HBASE-5458-090-1.patch [ 12565345 ]
        Hide
        Elliott Clark added a comment -

        Adds volatile.
        Makes the builder function return a codec rather than setting it.

        Show
        Elliott Clark added a comment - Adds volatile. Makes the builder function return a codec rather than setting it.
        Hide
        David McIntosh added a comment -

        This patch prevents instantiating the same codec multiple times but it doesn't look like it addresses the race condition I originally ran into. The problematic sequence is like this:

        Thread 1 calls GZ.getCodec for the first time. It sees that codec is null and makes it into the buildCodec method. It executes codec = new GzipCodec() but before it can set the configuration thread 2 also calls GZ.getCodec. Thread 2 sees that codec is not null and returns it resulting in an exception if it is used before Thread 1 has set the configuration on it.

        The GzipCodec is the only one affected since the other codecs are able to set the configuration in the constructor.

        Show
        David McIntosh added a comment - This patch prevents instantiating the same codec multiple times but it doesn't look like it addresses the race condition I originally ran into. The problematic sequence is like this: Thread 1 calls GZ.getCodec for the first time. It sees that codec is null and makes it into the buildCodec method. It executes codec = new GzipCodec() but before it can set the configuration thread 2 also calls GZ.getCodec. Thread 2 sees that codec is not null and returns it resulting in an exception if it is used before Thread 1 has set the configuration on it. The GzipCodec is the only one affected since the other codecs are able to set the configuration in the constructor.
        Hide
        Jonathan Hsieh added a comment -

        I'm fine with the double checked locking.

        Hm.. I thought transient was used only for java serialization – we're actually using that for the Compression class? That seems funny since I thought we always did writables/pbufs. (if we aren't using java serialization file a follow on to remove the transients?)

        Show
        Jonathan Hsieh added a comment - I'm fine with the double checked locking. Hm.. I thought transient was used only for java serialization – we're actually using that for the Compression class? That seems funny since I thought we always did writables/pbufs. (if we aren't using java serialization file a follow on to remove the transients?)
        Hide
        Elliott Clark added a comment -

        The lock's transient so that if someone tries to serialize an HFile that's been compressed there's no lock object that's serialized. It's only there as a helper, and has no real state to serialize.

        This method is called on every open of HFiles. So it's in a pretty critical path; which is why I went with double checked locking. Though there aren't a lot of these calls made so it could be ok.

        I'll make the codecs volatile as well as transient.

        Show
        Elliott Clark added a comment - The lock's transient so that if someone tries to serialize an HFile that's been compressed there's no lock object that's serialized. It's only there as a helper, and has no real state to serialize. This method is called on every open of HFiles. So it's in a pretty critical path; which is why I went with double checked locking. Though there aren't a lot of these calls made so it could be ok. I'll make the codecs volatile as well as transient.
        Hide
        Jonathan Hsieh added a comment -

        I didn't look at all the context, but why is lock transient?

        Do we really need double-checked locking (how often is this called and is it in the critical path)? simpler to just do the normal locking pattern (I'd prefer this)?

        If we do double-checked locking, let's do it correctly – I believe the lzoCodec and snappyCodec need to be volatile instead of transient. See http://en.wikipedia.org/wiki/Double-checked_locking

        Show
        Jonathan Hsieh added a comment - I didn't look at all the context, but why is lock transient? Do we really need double-checked locking (how often is this called and is it in the critical path)? simpler to just do the normal locking pattern (I'd prefer this)? If we do double-checked locking, let's do it correctly – I believe the lzoCodec and snappyCodec need to be volatile instead of transient. See http://en.wikipedia.org/wiki/Double-checked_locking
        Elliott Clark made changes -
        Attachment HBASE-5458-090-0.patch [ 12565227 ]
        Elliott Clark made changes -
        Assignee Elliott Clark [ eclark ]
        Hide
        Ted Yu added a comment -
            try {
              Compressor c = algo.getCompressor();
              algo.returnCompressor(c);
              compressionTestResults[algo.ordinal()] = true; // passes
            } catch (Throwable t) {
              compressionTestResults[algo.ordinal()] = false; // failure
              throw new IOException(t);
            }
        

        How about catching NPE in the above code and call algo.getCompressor() again ?
        We set compressionTestResults to false after the retry fails.

        Show
        Ted Yu added a comment - try { Compressor c = algo.getCompressor(); algo.returnCompressor(c); compressionTestResults[algo.ordinal()] = true ; // passes } catch (Throwable t) { compressionTestResults[algo.ordinal()] = false ; // failure throw new IOException(t); } How about catching NPE in the above code and call algo.getCompressor() again ? We set compressionTestResults to false after the retry fails.
        Ted Yu made changes -
        Field Original Value New Value
        Description I've seen some occasional NullPointerExceptions in ZlibFactory.isNativeZlibLoaded(conf) during region server startups and the completebulkload process. This is being caused by a null configuration getting passed to the isNativeZlibLoaded method. I think this happens when 2 or more threads call the CompressionTest.testCompression method at once. If the GZ algorithm has not been tested yet both threads could continue on and attempt to load the compressor. For GZ the getCodec method is not thread safe which could lead to one thread getting a reference to a GzipCodec that has a null configuration.

        current:
              DefaultCodec getCodec(Configuration conf) {
                if (codec == null) {
                  codec = new GzipCodec();
                  codec.setConf(new Configuration(conf));
                }

                return codec;
              }

        one possible fix would be something like this:
              DefaultCodec getCodec(Configuration conf) {
                if (codec == null) {
                  GzipCodec gzip = new GzipCodec();
                  gzip.setConf(new Configuration(conf));
                  codec = gzip;
                }

                return codec;
              }

        But that may not be totally safe without some synchronization. An upstream fix in CompressionTest could also prevent multi thread access to GZ.getCodec(conf)

        exceptions:
        12/02/21 16:11:56 ERROR handler.OpenRegionHandler: Failed open of region=all-monthly,,1326263896983.bf574519a95263ec23a2bad9f5b8cbf4.
        java.io.IOException: java.lang.NullPointerException
                at org.apache.hadoop.hbase.util.CompressionTest.testCompression(CompressionTest.java:89)
                at org.apache.hadoop.hbase.regionserver.HRegion.checkCompressionCodecs(HRegion.java:2670)
                at org.apache.hadoop.hbase.regionserver.HRegion.openHRegion(HRegion.java:2659)
                at org.apache.hadoop.hbase.regionserver.HRegion.openHRegion(HRegion.java:2647)
                at org.apache.hadoop.hbase.regionserver.handler.OpenRegionHandler.openRegion(OpenRegionHandler.java:312)
                at org.apache.hadoop.hbase.regionserver.handler.OpenRegionHandler.process(OpenRegionHandler.java:99)
                at org.apache.hadoop.hbase.executor.EventHandler.run(EventHandler.java:158)
                at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
                at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
                at java.lang.Thread.run(Thread.java:662)
        Caused by: java.lang.NullPointerException
                at org.apache.hadoop.io.compress.zlib.ZlibFactory.isNativeZlibLoaded(ZlibFactory.java:63)
                at org.apache.hadoop.io.compress.GzipCodec.getCompressorType(GzipCodec.java:166)
                at org.apache.hadoop.io.compress.CodecPool.getCompressor(CodecPool.java:100)
                at org.apache.hadoop.io.compress.CodecPool.getCompressor(CodecPool.java:112)
                at org.apache.hadoop.hbase.io.hfile.Compression$Algorithm.getCompressor(Compression.java:236)
                at org.apache.hadoop.hbase.util.CompressionTest.testCompression(CompressionTest.java:84)
                ... 9 more

        Caused by: java.io.IOException: java.lang.NullPointerException
                at org.apache.hadoop.hbase.util.CompressionTest.testCompression(CompressionTest.java:89)
                at org.apache.hadoop.hbase.io.hfile.HFile$Reader.readTrailer(HFile.java:890)
                at org.apache.hadoop.hbase.io.hfile.HFile$Reader.loadFileInfo(HFile.java:819)
                at org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles.groupOrSplit(LoadIncrementalHFiles.java:405)
                at org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles$2.call(LoadIncrementalHFiles.java:323)
                at org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles$2.call(LoadIncrementalHFiles.java:321)
                at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
                at java.util.concurrent.FutureTask.run(FutureTask.java:138)
                at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
                at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
                at java.lang.Thread.run(Thread.java:662)
        Caused by: java.lang.NullPointerException
                at org.apache.hadoop.io.compress.zlib.ZlibFactory.isNativeZlibLoaded(ZlibFactory.java:63)
                at org.apache.hadoop.io.compress.GzipCodec.getCompressorType(GzipCodec.java:166)
                at org.apache.hadoop.io.compress.CodecPool.getCompressor(CodecPool.java:100)
                at org.apache.hadoop.io.compress.CodecPool.getCompressor(CodecPool.java:112)
                at org.apache.hadoop.hbase.io.hfile.Compression$Algorithm.getCompressor(Compression.java:236)
                at org.apache.hadoop.hbase.util.CompressionTest.testCompression(CompressionTest.java:84)
                ... 10 more
        I've seen some occasional NullPointerExceptions in ZlibFactory.isNativeZlibLoaded(conf) during region server startups and the completebulkload process. This is being caused by a null configuration getting passed to the isNativeZlibLoaded method. I think this happens when 2 or more threads call the CompressionTest.testCompression method at once. If the GZ algorithm has not been tested yet both threads could continue on and attempt to load the compressor. For GZ the getCodec method is not thread safe which could lead to one thread getting a reference to a GzipCodec that has a null configuration.
        {code}
        current:
              DefaultCodec getCodec(Configuration conf) {
                if (codec == null) {
                  codec = new GzipCodec();
                  codec.setConf(new Configuration(conf));
                }

                return codec;
              }
        {code}
        one possible fix would be something like this:
        {code}
              DefaultCodec getCodec(Configuration conf) {
                if (codec == null) {
                  GzipCodec gzip = new GzipCodec();
                  gzip.setConf(new Configuration(conf));
                  codec = gzip;
                }

                return codec;
              }
        {code}
        But that may not be totally safe without some synchronization. An upstream fix in CompressionTest could also prevent multi thread access to GZ.getCodec(conf)

        exceptions:
        12/02/21 16:11:56 ERROR handler.OpenRegionHandler: Failed open of region=all-monthly,,1326263896983.bf574519a95263ec23a2bad9f5b8cbf4.
        java.io.IOException: java.lang.NullPointerException
                at org.apache.hadoop.hbase.util.CompressionTest.testCompression(CompressionTest.java:89)
                at org.apache.hadoop.hbase.regionserver.HRegion.checkCompressionCodecs(HRegion.java:2670)
                at org.apache.hadoop.hbase.regionserver.HRegion.openHRegion(HRegion.java:2659)
                at org.apache.hadoop.hbase.regionserver.HRegion.openHRegion(HRegion.java:2647)
                at org.apache.hadoop.hbase.regionserver.handler.OpenRegionHandler.openRegion(OpenRegionHandler.java:312)
                at org.apache.hadoop.hbase.regionserver.handler.OpenRegionHandler.process(OpenRegionHandler.java:99)
                at org.apache.hadoop.hbase.executor.EventHandler.run(EventHandler.java:158)
                at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
                at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
                at java.lang.Thread.run(Thread.java:662)
        Caused by: java.lang.NullPointerException
                at org.apache.hadoop.io.compress.zlib.ZlibFactory.isNativeZlibLoaded(ZlibFactory.java:63)
                at org.apache.hadoop.io.compress.GzipCodec.getCompressorType(GzipCodec.java:166)
                at org.apache.hadoop.io.compress.CodecPool.getCompressor(CodecPool.java:100)
                at org.apache.hadoop.io.compress.CodecPool.getCompressor(CodecPool.java:112)
                at org.apache.hadoop.hbase.io.hfile.Compression$Algorithm.getCompressor(Compression.java:236)
                at org.apache.hadoop.hbase.util.CompressionTest.testCompression(CompressionTest.java:84)
                ... 9 more

        Caused by: java.io.IOException: java.lang.NullPointerException
                at org.apache.hadoop.hbase.util.CompressionTest.testCompression(CompressionTest.java:89)
                at org.apache.hadoop.hbase.io.hfile.HFile$Reader.readTrailer(HFile.java:890)
                at org.apache.hadoop.hbase.io.hfile.HFile$Reader.loadFileInfo(HFile.java:819)
                at org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles.groupOrSplit(LoadIncrementalHFiles.java:405)
                at org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles$2.call(LoadIncrementalHFiles.java:323)
                at org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles$2.call(LoadIncrementalHFiles.java:321)
                at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
                at java.util.concurrent.FutureTask.run(FutureTask.java:138)
                at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
                at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
                at java.lang.Thread.run(Thread.java:662)
        Caused by: java.lang.NullPointerException
                at org.apache.hadoop.io.compress.zlib.ZlibFactory.isNativeZlibLoaded(ZlibFactory.java:63)
                at org.apache.hadoop.io.compress.GzipCodec.getCompressorType(GzipCodec.java:166)
                at org.apache.hadoop.io.compress.CodecPool.getCompressor(CodecPool.java:100)
                at org.apache.hadoop.io.compress.CodecPool.getCompressor(CodecPool.java:112)
                at org.apache.hadoop.hbase.io.hfile.Compression$Algorithm.getCompressor(Compression.java:236)
                at org.apache.hadoop.hbase.util.CompressionTest.testCompression(CompressionTest.java:84)
                ... 10 more
        David McIntosh created issue -

          People

          • Assignee:
            Elliott Clark
            Reporter:
            David McIntosh
          • Votes:
            0 Vote for this issue
            Watchers:
            8 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development