HBase
  1. HBase
  2. HBASE-8615

HLog Compression may fail due to Hadoop fs input stream returning partial bytes

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Critical Critical
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 0.98.0, 0.95.2
    • Component/s: Replication
    • Labels:
      None
    • Hadoop Flags:
      Reviewed

      Description

      In a recent test run, I noticed the following in test output:

      2013-05-24 22:01:02,424 DEBUG [RegionServer:0;kiyo.gq1.ygridcore.net,42690,1369432806911.replicationSource,2] fs.HFileSystem$ReorderWALBlocks(327): /user/hortonzy/hbase/.logs/kiyo.gq1.ygridcore.net,42690,1369432806911/kiyo.gq1.ygridcore.net%2C42690%2C1369432806911.1369432840428 is an HLog file, so reordering blocks, last hostname will be:kiyo.gq1.ygridcore.net
      2013-05-24 22:01:02,429 DEBUG [RegionServer:0;kiyo.gq1.ygridcore.net,42690,1369432806911.replicationSource,2] wal.ProtobufLogReader(118): After reading the trailer: walEditsStopOffset: 132235, fileLength: 132243, trailerPresent: true
      2013-05-24 22:01:02,438 ERROR [RegionServer:0;kiyo.gq1.ygridcore.net,42690,1369432806911.replicationSource,2] wal.ProtobufLogReader(236): Error  while reading 691 WAL KVs; started reading at 53272 and read up to 65538
      2013-05-24 22:01:02,438 WARN  [RegionServer:0;kiyo.gq1.ygridcore.net,42690,1369432806911.replicationSource,2] regionserver.ReplicationSource(324): 2 Got:
      java.io.IOException: Error  while reading 691 WAL KVs; started reading at 53272 and read up to 65538
              at org.apache.hadoop.hbase.regionserver.wal.ProtobufLogReader.readNext(ProtobufLogReader.java:237)
              at org.apache.hadoop.hbase.regionserver.wal.ReaderBase.next(ReaderBase.java:96)
              at org.apache.hadoop.hbase.replication.regionserver.ReplicationHLogReaderManager.readNextAndSetPosition(ReplicationHLogReaderManager.java:89)
              at org.apache.hadoop.hbase.replication.regionserver.ReplicationSource.readAllEntriesToReplicateOrNextFile(ReplicationSource.java:404)
              at org.apache.hadoop.hbase.replication.regionserver.ReplicationSource.run(ReplicationSource.java:320)
      Caused by: java.lang.IndexOutOfBoundsException: index (30062) must be less than size (1)
              at com.google.common.base.Preconditions.checkElementIndex(Preconditions.java:305)
              at com.google.common.base.Preconditions.checkElementIndex(Preconditions.java:284)
              at org.apache.hadoop.hbase.regionserver.wal.LRUDictionary$BidirectionalLRUMap.get(LRUDictionary.java:124)
              at org.apache.hadoop.hbase.regionserver.wal.LRUDictionary$BidirectionalLRUMap.access$000(LRUDictionary.java:71)
              at org.apache.hadoop.hbase.regionserver.wal.LRUDictionary.getEntry(LRUDictionary.java:42)
              at org.apache.hadoop.hbase.regionserver.wal.WALCellCodec$CompressedKvDecoder.readIntoArray(WALCellCodec.java:210)
              at org.apache.hadoop.hbase.regionserver.wal.WALCellCodec$CompressedKvDecoder.parseCell(WALCellCodec.java:184)
              at org.apache.hadoop.hbase.codec.BaseDecoder.advance(BaseDecoder.java:46)
              at org.apache.hadoop.hbase.regionserver.wal.WALEdit.readFromCells(WALEdit.java:213)
              at org.apache.hadoop.hbase.regionserver.wal.ProtobufLogReader.readNext(ProtobufLogReader.java:217)
              ... 4 more
      2013-05-24 22:01:02,439 DEBUG [RegionServer:0;kiyo.gq1.ygridcore.net,42690,1369432806911.replicationSource,2] regionserver.ReplicationSource(583): Nothing to replicate, sleeping 100 times 10
      

      Will attach test output.

      1. org.apache.hadoop.hbase.replication.TestReplicationQueueFailoverCompressed-output.txt
        744 kB
        Ted Yu
      2. HBASE-8615-test.patch
        3 kB
        Jean-Daniel Cryans
      3. 8615-v5.txt
        6 kB
        Ted Yu
      4. 8615-v4.txt
        6 kB
        Ted Yu
      5. 8615-v3.txt
        4 kB
        Ted Yu
      6. 8615-v2.txt
        5 kB
        Ted Yu
      7. 172.21.3.117%2C60020%2C1375222888304.1375222894855.zip
        8.20 MB
        Jean-Daniel Cryans

        Issue Links

          Activity

          Hide
          Jean-Daniel Cryans added a comment -

          Assigning to me, it failed again in this build:

          http://54.241.6.143/job/HBase-TRUNK-Hadoop-2/org.apache.hbase$hbase-server/421/testReport/junit/org.apache.hadoop.hbase.replication/TestReplicationKillMasterRSCompressed/killOneMasterRS/

          I tried to repro on Hadoop 1 and I'm not successful. Even tho it shouldn't matter, I'll give it a shot on Hadoop 2.

          The cause for this issue is that there's seems to be one case where we clean the compression context in the middle of reading a file.

          Show
          Jean-Daniel Cryans added a comment - Assigning to me, it failed again in this build: http://54.241.6.143/job/HBase-TRUNK-Hadoop-2/org.apache.hbase$hbase-server/421/testReport/junit/org.apache.hadoop.hbase.replication/TestReplicationKillMasterRSCompressed/killOneMasterRS/ I tried to repro on Hadoop 1 and I'm not successful. Even tho it shouldn't matter, I'll give it a shot on Hadoop 2. The cause for this issue is that there's seems to be one case where we clean the compression context in the middle of reading a file.
          Show
          stack added a comment - In case this is of help Jean-Daniel Cryans , fail on different jenkins: https://builds.apache.org/job/hbase-0.95-on-hadoop2/195/testReport/org.apache.hadoop.hbase.replication/TestReplicationKillMasterRSCompressed/killOneMasterRS/
          Hide
          stack added a comment -
          Show
          stack added a comment - More https://builds.apache.org/job/hbase-0.95-on-hadoop2/196/testReport/org.apache.hadoop.hbase.replication/TestReplicationKillMasterRSCompressed/killOneMasterRS/ I'm removing this test for now. JD is gone for the w/e and I want the tests to pass meantime.
          Hide
          stack added a comment -

          Ugh. Let me see if it fails more. Will remove it then.

          Show
          stack added a comment - Ugh. Let me see if it fails more. Will remove it then.
          Show
          stack added a comment - Failed again here https://builds.apache.org/view/H-L/view/HBase/job/hbase-0.95-on-hadoop2/199/testReport/junit/org.apache.hadoop.hbase.replication/TestReplicationKillMasterRSCompressed/killOneMasterRS/ I removed this suite of tests HBASE-9062 till Jean-Daniel Cryans says he wants to have them in the mix again HBASE-9061
          Hide
          Jean-Daniel Cryans added a comment -

          It's just the compression test that fails FWIW, so it seems that something is broken in HLog compression or the way we read them.

          Show
          Jean-Daniel Cryans added a comment - It's just the compression test that fails FWIW, so it seems that something is broken in HLog compression or the way we read them.
          Hide
          Jean-Daniel Cryans added a comment -

          Interestingly this doesn't happen just on Hadoop 2.0, it's just that it's much less likely to happen on Hadoop 1. I have a small unit test that can recreate the problem and it requires inserting 3x more data in Hadoop 1 to see it fail consistently.

          Now I gotta dig deeper...

          Show
          Jean-Daniel Cryans added a comment - Interestingly this doesn't happen just on Hadoop 2.0, it's just that it's much less likely to happen on Hadoop 1. I have a small unit test that can recreate the problem and it requires inserting 3x more data in Hadoop 1 to see it fail consistently. Now I gotta dig deeper...
          Hide
          Jean-Daniel Cryans added a comment -

          Here's what I know about the different problems.

          The first one is that we find data in the compressed HLog that's unexpected. It happens easily on Hadoop 2 and takes more data to hit on Hadoop 1. It manifests itself as show in the jira's description or like this:

          2013-07-27 15:17:54,789 ERROR [RS:1;vesta:34230.replicationSource,2] wal.ProtobufLogReader(236): Error  while reading 4 WAL KVs; started reading at 65475 and read up to 65541
          2013-07-27 15:17:54,790 WARN  [RS:1;vesta:34230.replicationSource,2] regionserver.ReplicationSource(323): 2 Got: 
          java.io.IOException: Error  while reading 4 WAL KVs; started reading at 65475 and read up to 65541
          	at org.apache.hadoop.hbase.regionserver.wal.ProtobufLogReader.readNext(ProtobufLogReader.java:237)
          	at org.apache.hadoop.hbase.regionserver.wal.ReaderBase.next(ReaderBase.java:96)
          	at org.apache.hadoop.hbase.replication.regionserver.ReplicationHLogReaderManager.readNextAndSetPosition(ReplicationHLogReaderManager.java:89)
          	at org.apache.hadoop.hbase.replication.regionserver.ReplicationSource.readAllEntriesToReplicateOrNextFile(ReplicationSource.java:407)
          	at org.apache.hadoop.hbase.replication.regionserver.ReplicationSource.run(ReplicationSource.java:319)
          Caused by: java.lang.IllegalArgumentException
          	at com.google.common.base.Preconditions.checkArgument(Preconditions.java:76)
          	at org.apache.hadoop.hbase.regionserver.wal.WALCellCodec$StreamUtils.toShort(WALCellCodec.java:353)
          	at org.apache.hadoop.hbase.regionserver.wal.WALCellCodec$CompressedKvDecoder.readIntoArray(WALCellCodec.java:237)
          	at org.apache.hadoop.hbase.regionserver.wal.WALCellCodec$CompressedKvDecoder.parseCell(WALCellCodec.java:206)
          	at org.apache.hadoop.hbase.codec.BaseDecoder.advance(BaseDecoder.java:46)
          	at org.apache.hadoop.hbase.regionserver.wal.WALEdit.readFromCells(WALEdit.java:213)
          	at org.apache.hadoop.hbase.regionserver.wal.ProtobufLogReader.readNext(ProtobufLogReader.java:217)
          	... 4 more
          

          One thing I saw is that it always happens when we're close to a multiple of Short.MAX_VALUE. In the stack trace I just pasted you can see it started reading at 65475 and in the jira's description it was ending at 65538.

          I'm able to recreate the problem with at patch to TestReplicationHLogReaderManager that I'm going to attach later. I also was able to recreate the problem on a single node cluster and was able to grab a "corrupted" HLog that will also be attached.

          The other problem I found is that when appending WALEdits with only 1 KV to a compressed HLog, it hits an invalid PB:

          2013-07-31 11:38:52,156 ERROR [main] wal.ProtobufLogReader(199):
          Invalid PB while reading WAL, probably an unexpected EOF, ignoring
          com.google.protobuf.InvalidProtocolBufferException: Protocol message
          contained an invalid tag (zero).
                  at com.google.protobuf.InvalidProtocolBufferException.invalidTag(InvalidProtocolBufferException.java:68)
                  at com.google.protobuf.CodedInputStream.readTag(CodedInputStream.java:108)
                  at org.apache.hadoop.hbase.protobuf.generated.WALProtos$WALKey$Builder.mergeFrom(WALProtos.java:1120)
                  at org.apache.hadoop.hbase.protobuf.generated.WALProtos$WALKey$Builder.mergeFrom(WALProtos.java:885)
                  at com.google.protobuf.AbstractMessageLite$Builder.mergeFrom(AbstractMessageLite.java:212)
                  at com.google.protobuf.AbstractMessage$Builder.mergeFrom(AbstractMessage.java:746)
                  at com.google.protobuf.AbstractMessage$Builder.mergeFrom(AbstractMessage.java:238)
                  at com.google.protobuf.AbstractMessageLite$Builder.mergeDelimitedFrom(AbstractMessageLite.java:282)
                  at com.google.protobuf.AbstractMessage$Builder.mergeDelimitedFrom(AbstractMessage.java:760)
                  at com.google.protobuf.AbstractMessageLite$Builder.mergeDelimitedFrom(AbstractMessageLite.java:288)
                  at com.google.protobuf.AbstractMessage$Builder.mergeDelimitedFrom(AbstractMessage.java:752)
                  at org.apache.hadoop.hbase.regionserver.wal.ProtobufLogReader.readNext(ProtobufLogReader.java:197)
                  at org.apache.hadoop.hbase.regionserver.wal.ReaderBase.next(ReaderBase.java:96)
          

          Printing the position when it fails I can see it's still around a multiple of Short.MAX_VALUE, and using the unit test I attached you can reliably get the issue after reading the same number of edits. I wasn't able to trigger the issue in Hadoop 1 unfortunately, but it seems related.

          Show
          Jean-Daniel Cryans added a comment - Here's what I know about the different problems. The first one is that we find data in the compressed HLog that's unexpected. It happens easily on Hadoop 2 and takes more data to hit on Hadoop 1. It manifests itself as show in the jira's description or like this: 2013-07-27 15:17:54,789 ERROR [RS:1;vesta:34230.replicationSource,2] wal.ProtobufLogReader(236): Error while reading 4 WAL KVs; started reading at 65475 and read up to 65541 2013-07-27 15:17:54,790 WARN [RS:1;vesta:34230.replicationSource,2] regionserver.ReplicationSource(323): 2 Got: java.io.IOException: Error while reading 4 WAL KVs; started reading at 65475 and read up to 65541 at org.apache.hadoop.hbase.regionserver.wal.ProtobufLogReader.readNext(ProtobufLogReader.java:237) at org.apache.hadoop.hbase.regionserver.wal.ReaderBase.next(ReaderBase.java:96) at org.apache.hadoop.hbase.replication.regionserver.ReplicationHLogReaderManager.readNextAndSetPosition(ReplicationHLogReaderManager.java:89) at org.apache.hadoop.hbase.replication.regionserver.ReplicationSource.readAllEntriesToReplicateOrNextFile(ReplicationSource.java:407) at org.apache.hadoop.hbase.replication.regionserver.ReplicationSource.run(ReplicationSource.java:319) Caused by: java.lang.IllegalArgumentException at com.google.common.base.Preconditions.checkArgument(Preconditions.java:76) at org.apache.hadoop.hbase.regionserver.wal.WALCellCodec$StreamUtils.toShort(WALCellCodec.java:353) at org.apache.hadoop.hbase.regionserver.wal.WALCellCodec$CompressedKvDecoder.readIntoArray(WALCellCodec.java:237) at org.apache.hadoop.hbase.regionserver.wal.WALCellCodec$CompressedKvDecoder.parseCell(WALCellCodec.java:206) at org.apache.hadoop.hbase.codec.BaseDecoder.advance(BaseDecoder.java:46) at org.apache.hadoop.hbase.regionserver.wal.WALEdit.readFromCells(WALEdit.java:213) at org.apache.hadoop.hbase.regionserver.wal.ProtobufLogReader.readNext(ProtobufLogReader.java:217) ... 4 more One thing I saw is that it always happens when we're close to a multiple of Short.MAX_VALUE. In the stack trace I just pasted you can see it started reading at 65475 and in the jira's description it was ending at 65538. I'm able to recreate the problem with at patch to TestReplicationHLogReaderManager that I'm going to attach later. I also was able to recreate the problem on a single node cluster and was able to grab a "corrupted" HLog that will also be attached. The other problem I found is that when appending WALEdits with only 1 KV to a compressed HLog, it hits an invalid PB: 2013-07-31 11:38:52,156 ERROR [main] wal.ProtobufLogReader(199): Invalid PB while reading WAL, probably an unexpected EOF, ignoring com.google.protobuf.InvalidProtocolBufferException: Protocol message contained an invalid tag (zero). at com.google.protobuf.InvalidProtocolBufferException.invalidTag(InvalidProtocolBufferException.java:68) at com.google.protobuf.CodedInputStream.readTag(CodedInputStream.java:108) at org.apache.hadoop.hbase.protobuf.generated.WALProtos$WALKey$Builder.mergeFrom(WALProtos.java:1120) at org.apache.hadoop.hbase.protobuf.generated.WALProtos$WALKey$Builder.mergeFrom(WALProtos.java:885) at com.google.protobuf.AbstractMessageLite$Builder.mergeFrom(AbstractMessageLite.java:212) at com.google.protobuf.AbstractMessage$Builder.mergeFrom(AbstractMessage.java:746) at com.google.protobuf.AbstractMessage$Builder.mergeFrom(AbstractMessage.java:238) at com.google.protobuf.AbstractMessageLite$Builder.mergeDelimitedFrom(AbstractMessageLite.java:282) at com.google.protobuf.AbstractMessage$Builder.mergeDelimitedFrom(AbstractMessage.java:760) at com.google.protobuf.AbstractMessageLite$Builder.mergeDelimitedFrom(AbstractMessageLite.java:288) at com.google.protobuf.AbstractMessage$Builder.mergeDelimitedFrom(AbstractMessage.java:752) at org.apache.hadoop.hbase.regionserver.wal.ProtobufLogReader.readNext(ProtobufLogReader.java:197) at org.apache.hadoop.hbase.regionserver.wal.ReaderBase.next(ReaderBase.java:96) Printing the position when it fails I can see it's still around a multiple of Short.MAX_VALUE, and using the unit test I attached you can reliably get the issue after reading the same number of edits. I wasn't able to trigger the issue in Hadoop 1 unfortunately, but it seems related.
          Hide
          Jean-Daniel Cryans added a comment -

          Attaching two files.

          The first one is a patch for TestReplicationHLogReaderManager that shows how to trigger the two issues I was talking about in the previous comment. Change a few numbers according to the code comments I left and you're good to go.

          The second file is a "corrupted" HLog (I'm not sure if it's really a bad log or it just triggers something bad on the read path). Use the "bin/hbase hlog" tool to read it and don't forget to set hbase.regionserver.wal.enablecompression to true since it's a compressed HLog.

          Show
          Jean-Daniel Cryans added a comment - Attaching two files. The first one is a patch for TestReplicationHLogReaderManager that shows how to trigger the two issues I was talking about in the previous comment. Change a few numbers according to the code comments I left and you're good to go. The second file is a "corrupted" HLog (I'm not sure if it's really a bad log or it just triggers something bad on the read path). Use the "bin/hbase hlog" tool to read it and don't forget to set hbase.regionserver.wal.enablecompression to true since it's a compressed HLog.
          Hide
          Jean-Daniel Cryans added a comment -

          Pushing to 0.96.0, won't be fixed in time for 0.95.2, so it means that HLog compression is broken and cannot be used.

          Show
          Jean-Daniel Cryans added a comment - Pushing to 0.96.0, won't be fixed in time for 0.95.2, so it means that HLog compression is broken and cannot be used.
          Hide
          Sergey Shelukhin added a comment -

          Could it be similar issue to HBASE-8498?

          The cause for this issue is that there's seems to be one case where we clean the compression context in the middle of reading a file

          Is this a statement based on observation (then it would invalidate my comment)?

          Show
          Sergey Shelukhin added a comment - Could it be similar issue to HBASE-8498 ? The cause for this issue is that there's seems to be one case where we clean the compression context in the middle of reading a file Is this a statement based on observation (then it would invalidate my comment)?
          Hide
          Jean-Daniel Cryans added a comment -

          Sergey, see my later comments.

          Show
          Jean-Daniel Cryans added a comment - Sergey, see my later comments.
          Hide
          Sergey Shelukhin added a comment -

          Which one?
          Both cases seem to be reading /over/ the 64k boundary, not just close to it. Hadoop fs input stream class reserves (and exercises as the other jira shows) the right to not give you all the bytes you're asking for, on block boundaries. I wonder if 65535 could be some block boundary, compression code calls read here and there as far as I see. See fix to KeyValue::iscreate, it couldn't read a pitiful int properly, it got cut by some boundary.
          Are you implying I should supply a patch? I can do that but probably not this week unfortunately.
          Or do you mean my hunch is invalid. Just checking

          Show
          Sergey Shelukhin added a comment - Which one? Both cases seem to be reading /over/ the 64k boundary, not just close to it. Hadoop fs input stream class reserves (and exercises as the other jira shows) the right to not give you all the bytes you're asking for, on block boundaries. I wonder if 65535 could be some block boundary, compression code calls read here and there as far as I see. See fix to KeyValue::iscreate, it couldn't read a pitiful int properly, it got cut by some boundary. Are you implying I should supply a patch? I can do that but probably not this week unfortunately. Or do you mean my hunch is invalid. Just checking
          Hide
          Jean-Daniel Cryans added a comment -

          You quoted an earlier comment where I hadn't done as much investigation as the one where I start with "Here's what I know about the different problems" where I aimed at dumping my whole understanding of the problem.

          Are you implying I should supply a patch? I can do that but probably not this week unfortunately.

          Nope.

          Or do you mean my hunch is invalid. Just checking

          It could very well be related and it could explain why we're not seeing this problem with an uncompressed log.

          Show
          Jean-Daniel Cryans added a comment - You quoted an earlier comment where I hadn't done as much investigation as the one where I start with "Here's what I know about the different problems" where I aimed at dumping my whole understanding of the problem. Are you implying I should supply a patch? I can do that but probably not this week unfortunately. Nope. Or do you mean my hunch is invalid. Just checking It could very well be related and it could explain why we're not seeing this problem with an uncompressed log.
          Hide
          Ted Yu added a comment -

          With help from J-D and Sergey, here is a patch that fixes the problem.

          I looped TestReplicationHLogReaderManager several times based on hadoop 2.1 and the test passed.

          Please comment.

          Show
          Ted Yu added a comment - With help from J-D and Sergey, here is a patch that fixes the problem. I looped TestReplicationHLogReaderManager several times based on hadoop 2.1 and the test passed. Please comment.
          Hide
          Hadoop QA added a comment -

          +1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12596744/8615-v2.txt
          against trunk revision .

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 3 new or modified tests.

          +1 hadoop1.0. The patch compiles against the hadoop 1.0 profile.

          +1 hadoop2.0. The patch compiles against the hadoop 2.0 profile.

          +1 javadoc. The javadoc tool did not generate any warning messages.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          +1 lineLengths. The patch does not introduce lines longer than 100

          +1 site. The mvn site goal succeeds with this patch.

          +1 core tests. The patch passed unit tests in .

          Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/6643//testReport/
          Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/6643//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-prefix-tree.html
          Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/6643//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-client.html
          Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/6643//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-common.html
          Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/6643//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-protocol.html
          Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/6643//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-server.html
          Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/6643//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop1-compat.html
          Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/6643//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-examples.html
          Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/6643//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html
          Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/6643//console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - +1 overall . Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12596744/8615-v2.txt against trunk revision . +1 @author . The patch does not contain any @author tags. +1 tests included . The patch appears to include 3 new or modified tests. +1 hadoop1.0 . The patch compiles against the hadoop 1.0 profile. +1 hadoop2.0 . The patch compiles against the hadoop 2.0 profile. +1 javadoc . The javadoc tool did not generate any warning messages. +1 javac . The applied patch does not increase the total number of javac compiler warnings. +1 findbugs . The patch does not introduce any new Findbugs (version 1.3.9) warnings. +1 release audit . The applied patch does not increase the total number of release audit warnings. +1 lineLengths . The patch does not introduce lines longer than 100 +1 site . The mvn site goal succeeds with this patch. +1 core tests . The patch passed unit tests in . Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/6643//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/6643//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-prefix-tree.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/6643//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-client.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/6643//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-common.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/6643//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-protocol.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/6643//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-server.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/6643//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop1-compat.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/6643//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-examples.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/6643//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/6643//console This message is automatically generated.
          Hide
          Sergey Shelukhin added a comment -

          1) A method like that already exists (IOUtils.readFully), the only reason KV has custom code in one place is because in that case it's ok to have 0 bytes but not ok to have some other insufficient number of bytes.
          2) Does this cover all cases where compressed input might call read (incl. transitively thru some other call)?

          Show
          Sergey Shelukhin added a comment - 1) A method like that already exists (IOUtils.readFully), the only reason KV has custom code in one place is because in that case it's ok to have 0 bytes but not ok to have some other insufficient number of bytes. 2) Does this cover all cases where compressed input might call read (incl. transitively thru some other call)?
          Hide
          Ted Yu added a comment -

          Patch v3 addresses Sergey's comments.

          I did a search among related classes but didn't find other calls of in.read() which writes to buffer

          Show
          Ted Yu added a comment - Patch v3 addresses Sergey's comments. I did a search among related classes but didn't find other calls of in.read() which writes to buffer
          Hide
          Hadoop QA added a comment -

          +1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12596761/8615-v3.txt
          against trunk revision .

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 3 new or modified tests.

          +1 hadoop1.0. The patch compiles against the hadoop 1.0 profile.

          +1 hadoop2.0. The patch compiles against the hadoop 2.0 profile.

          +1 javadoc. The javadoc tool did not generate any warning messages.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          +1 lineLengths. The patch does not introduce lines longer than 100

          +1 site. The mvn site goal succeeds with this patch.

          +1 core tests. The patch passed unit tests in .

          Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/6649//testReport/
          Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/6649//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-protocol.html
          Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/6649//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-client.html
          Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/6649//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-examples.html
          Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/6649//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop1-compat.html
          Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/6649//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-prefix-tree.html
          Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/6649//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-common.html
          Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/6649//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-server.html
          Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/6649//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html
          Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/6649//console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - +1 overall . Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12596761/8615-v3.txt against trunk revision . +1 @author . The patch does not contain any @author tags. +1 tests included . The patch appears to include 3 new or modified tests. +1 hadoop1.0 . The patch compiles against the hadoop 1.0 profile. +1 hadoop2.0 . The patch compiles against the hadoop 2.0 profile. +1 javadoc . The javadoc tool did not generate any warning messages. +1 javac . The applied patch does not increase the total number of javac compiler warnings. +1 findbugs . The patch does not introduce any new Findbugs (version 1.3.9) warnings. +1 release audit . The applied patch does not increase the total number of release audit warnings. +1 lineLengths . The patch does not introduce lines longer than 100 +1 site . The mvn site goal succeeds with this patch. +1 core tests . The patch passed unit tests in . Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/6649//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/6649//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-protocol.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/6649//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-client.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/6649//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-examples.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/6649//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop1-compat.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/6649//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-prefix-tree.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/6649//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-common.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/6649//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-server.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/6649//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/6649//console This message is automatically generated.
          Hide
          ramkrishna.s.vasudevan added a comment -

          Similar issue as in HBASE-8498. Patch looks good to me.

          Show
          ramkrishna.s.vasudevan added a comment - Similar issue as in HBASE-8498 . Patch looks good to me.
          Hide
          Anoop Sam John added a comment -

          Fix looks reasonable to me.

          Show
          Anoop Sam John added a comment - Fix looks reasonable to me.
          Hide
          Jean-Daniel Cryans added a comment -

          Thanks for the patch Ted. To commit we should at least fix the unit test though because what I did was kind of a hack, TestReplicationHLogReaderManager isn't supposed to run on compressed data. Maybe do a TestReplicationHLogReaderManagerCompressed that just enables it?

          Then we could also test more than just one failure mode in there, easy refactor where you just have to pass the two ints to a method, then get rid of most of the comments. Right now it's just dirty.

          Finally, if we are fixing HLog compression for real, we need to also put TestReplicationKillMasterRSCompressed back AKA HBASE-9061.

          Show
          Jean-Daniel Cryans added a comment - Thanks for the patch Ted. To commit we should at least fix the unit test though because what I did was kind of a hack, TestReplicationHLogReaderManager isn't supposed to run on compressed data. Maybe do a TestReplicationHLogReaderManagerCompressed that just enables it? Then we could also test more than just one failure mode in there, easy refactor where you just have to pass the two ints to a method, then get rid of most of the comments. Right now it's just dirty. Finally, if we are fixing HLog compression for real, we need to also put TestReplicationKillMasterRSCompressed back AKA HBASE-9061 .
          Hide
          Ted Yu added a comment -

          The inclusion of changes to TestReplicationHLogReaderManager was to show that the problem has been solved.

          Since restoring TestReplicationKillMasterRSCompressed would be done in HBASE-9061, how about dropping the changes to TestReplicationHLogReaderManager in patch v4 ?

          Please comment.

          Show
          Ted Yu added a comment - The inclusion of changes to TestReplicationHLogReaderManager was to show that the problem has been solved. Since restoring TestReplicationKillMasterRSCompressed would be done in HBASE-9061 , how about dropping the changes to TestReplicationHLogReaderManager in patch v4 ? Please comment.
          Hide
          Jean-Daniel Cryans added a comment -

          I understand why you put it there and I think it's a valuable test because it's a micro-er (???) test than TestReplicationKillMasterRSCompressed. It just needs to be cleaned up.

          Show
          Jean-Daniel Cryans added a comment - I understand why you put it there and I think it's a valuable test because it's a micro-er (???) test than TestReplicationKillMasterRSCompressed. It just needs to be cleaned up.
          Hide
          Ted Yu added a comment -

          See if patch v4 makes the test better.

          Show
          Ted Yu added a comment - See if patch v4 makes the test better.
          Hide
          Jean-Daniel Cryans added a comment -

          +1 on v4, but on commit make it a Large test instead since now it takes 2 minutes to run.

          Show
          Jean-Daniel Cryans added a comment - +1 on v4, but on commit make it a Large test instead since now it takes 2 minutes to run.
          Hide
          Ted Yu added a comment -

          Patch v5 changes the test to large test.

          Show
          Ted Yu added a comment - Patch v5 changes the test to large test.
          Hide
          Ted Yu added a comment -

          From https://builds.apache.org/job/PreCommit-HBASE-Build/6669/console :

          Running org.apache.hadoop.hbase.replication.regionserver.TestReplicationHLogReaderManager
          Tests run: 8, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 230.327 sec

          Show
          Ted Yu added a comment - From https://builds.apache.org/job/PreCommit-HBASE-Build/6669/console : Running org.apache.hadoop.hbase.replication.regionserver.TestReplicationHLogReaderManager Tests run: 8, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 230.327 sec
          Hide
          Ted Yu added a comment -

          Integrated to trunk.

          If TestReplicationHLogReaderManager passes on hadoop 2.0, I will integrate to 0.95 branch.

          Thanks for the reviews.

          Show
          Ted Yu added a comment - Integrated to trunk. If TestReplicationHLogReaderManager passes on hadoop 2.0, I will integrate to 0.95 branch. Thanks for the reviews.
          Hide
          Hudson added a comment -

          FAILURE: Integrated in HBase-TRUNK-on-Hadoop-2.0.0 #659 (See https://builds.apache.org/job/HBase-TRUNK-on-Hadoop-2.0.0/659/)
          HBASE-8615 HLog Compression may fail due to Hadoop fs input stream returning partial bytes (tedyu: rev 1512133)

          • /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALCellCodec.java
          • /hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/replication/regionserver/TestReplicationHLogReaderManager.java
          Show
          Hudson added a comment - FAILURE: Integrated in HBase-TRUNK-on-Hadoop-2.0.0 #659 (See https://builds.apache.org/job/HBase-TRUNK-on-Hadoop-2.0.0/659/ ) HBASE-8615 HLog Compression may fail due to Hadoop fs input stream returning partial bytes (tedyu: rev 1512133) /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALCellCodec.java /hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/replication/regionserver/TestReplicationHLogReaderManager.java
          Hide
          Hudson added a comment -

          FAILURE: Integrated in HBase-TRUNK #4360 (See https://builds.apache.org/job/HBase-TRUNK/4360/)
          HBASE-8615 HLog Compression may fail due to Hadoop fs input stream returning partial bytes (tedyu: rev 1512133)

          • /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALCellCodec.java
          • /hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/replication/regionserver/TestReplicationHLogReaderManager.java
          Show
          Hudson added a comment - FAILURE: Integrated in HBase-TRUNK #4360 (See https://builds.apache.org/job/HBase-TRUNK/4360/ ) HBASE-8615 HLog Compression may fail due to Hadoop fs input stream returning partial bytes (tedyu: rev 1512133) /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALCellCodec.java /hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/replication/regionserver/TestReplicationHLogReaderManager.java
          Hide
          Ted Yu added a comment -

          Integrated to 0.95 as well

          Show
          Ted Yu added a comment - Integrated to 0.95 as well
          Hide
          Hudson added a comment -

          FAILURE: Integrated in hbase-0.95 #422 (See https://builds.apache.org/job/hbase-0.95/422/)
          HBASE-8615 HLog Compression may fail due to Hadoop fs input stream returning partial bytes (tedyu: rev 1512305)

          • /hbase/branches/0.95/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALCellCodec.java
          • /hbase/branches/0.95/hbase-server/src/test/java/org/apache/hadoop/hbase/replication/regionserver/TestReplicationHLogReaderManager.java
          Show
          Hudson added a comment - FAILURE: Integrated in hbase-0.95 #422 (See https://builds.apache.org/job/hbase-0.95/422/ ) HBASE-8615 HLog Compression may fail due to Hadoop fs input stream returning partial bytes (tedyu: rev 1512305) /hbase/branches/0.95/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALCellCodec.java /hbase/branches/0.95/hbase-server/src/test/java/org/apache/hadoop/hbase/replication/regionserver/TestReplicationHLogReaderManager.java
          Hide
          Hudson added a comment -

          FAILURE: Integrated in hbase-0.95-on-hadoop2 #227 (See https://builds.apache.org/job/hbase-0.95-on-hadoop2/227/)
          HBASE-8615 HLog Compression may fail due to Hadoop fs input stream returning partial bytes (tedyu: rev 1512305)

          • /hbase/branches/0.95/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALCellCodec.java
          • /hbase/branches/0.95/hbase-server/src/test/java/org/apache/hadoop/hbase/replication/regionserver/TestReplicationHLogReaderManager.java
          Show
          Hudson added a comment - FAILURE: Integrated in hbase-0.95-on-hadoop2 #227 (See https://builds.apache.org/job/hbase-0.95-on-hadoop2/227/ ) HBASE-8615 HLog Compression may fail due to Hadoop fs input stream returning partial bytes (tedyu: rev 1512305) /hbase/branches/0.95/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALCellCodec.java /hbase/branches/0.95/hbase-server/src/test/java/org/apache/hadoop/hbase/replication/regionserver/TestReplicationHLogReaderManager.java
          Hide
          Hudson added a comment -

          FAILURE: Integrated in HBase-TRUNK-on-Hadoop-2.0.0 #660 (See https://builds.apache.org/job/HBase-TRUNK-on-Hadoop-2.0.0/660/)
          HBASE-9061 Put back TestReplicationKillMasterRSCompressed when fixed over in HBASE-8615 (Ted Yu) (tedyu: rev 1512345)

          • /hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/replication/TestReplicationKillMasterRSCompressed.java
          Show
          Hudson added a comment - FAILURE: Integrated in HBase-TRUNK-on-Hadoop-2.0.0 #660 (See https://builds.apache.org/job/HBase-TRUNK-on-Hadoop-2.0.0/660/ ) HBASE-9061 Put back TestReplicationKillMasterRSCompressed when fixed over in HBASE-8615 (Ted Yu) (tedyu: rev 1512345) /hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/replication/TestReplicationKillMasterRSCompressed.java
          Hide
          Hudson added a comment -

          FAILURE: Integrated in HBase-TRUNK #4361 (See https://builds.apache.org/job/HBase-TRUNK/4361/)
          HBASE-9061 Put back TestReplicationKillMasterRSCompressed when fixed over in HBASE-8615 (Ted Yu) (tedyu: rev 1512345)

          • /hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/replication/TestReplicationKillMasterRSCompressed.java
          Show
          Hudson added a comment - FAILURE: Integrated in HBase-TRUNK #4361 (See https://builds.apache.org/job/HBase-TRUNK/4361/ ) HBASE-9061 Put back TestReplicationKillMasterRSCompressed when fixed over in HBASE-8615 (Ted Yu) (tedyu: rev 1512345) /hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/replication/TestReplicationKillMasterRSCompressed.java
          Hide
          Hudson added a comment -

          FAILURE: Integrated in hbase-0.95 #423 (See https://builds.apache.org/job/hbase-0.95/423/)
          HBASE-9061 Put back TestReplicationKillMasterRSCompressed when fixed over in HBASE-8615 (Ted Yu) (tedyu: rev 1512431)

          • /hbase/branches/0.95/hbase-server/src/test/java/org/apache/hadoop/hbase/replication/TestReplicationKillMasterRSCompressed.java
          Show
          Hudson added a comment - FAILURE: Integrated in hbase-0.95 #423 (See https://builds.apache.org/job/hbase-0.95/423/ ) HBASE-9061 Put back TestReplicationKillMasterRSCompressed when fixed over in HBASE-8615 (Ted Yu) (tedyu: rev 1512431) /hbase/branches/0.95/hbase-server/src/test/java/org/apache/hadoop/hbase/replication/TestReplicationKillMasterRSCompressed.java
          Hide
          Hudson added a comment -

          FAILURE: Integrated in hbase-0.95-on-hadoop2 #228 (See https://builds.apache.org/job/hbase-0.95-on-hadoop2/228/)
          HBASE-9061 Put back TestReplicationKillMasterRSCompressed when fixed over in HBASE-8615 (Ted Yu) (tedyu: rev 1512431)

          • /hbase/branches/0.95/hbase-server/src/test/java/org/apache/hadoop/hbase/replication/TestReplicationKillMasterRSCompressed.java
          Show
          Hudson added a comment - FAILURE: Integrated in hbase-0.95-on-hadoop2 #228 (See https://builds.apache.org/job/hbase-0.95-on-hadoop2/228/ ) HBASE-9061 Put back TestReplicationKillMasterRSCompressed when fixed over in HBASE-8615 (Ted Yu) (tedyu: rev 1512431) /hbase/branches/0.95/hbase-server/src/test/java/org/apache/hadoop/hbase/replication/TestReplicationKillMasterRSCompressed.java
          Hide
          stack added a comment -

          Ted Yu committed this. Resolving.

          Show
          stack added a comment - Ted Yu committed this. Resolving.

            People

            • Assignee:
              Ted Yu
              Reporter:
              Ted Yu
            • Votes:
              0 Vote for this issue
              Watchers:
              9 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development