HBase
  1. HBase
  2. HBASE-5864

Error while reading from hfile in 0.94

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Blocker Blocker
    • Resolution: Fixed
    • Affects Version/s: 0.94.0
    • Fix Version/s: 0.94.0, 0.95.0
    • Component/s: regionserver
    • Labels:
      None
    • Hadoop Flags:
      Reviewed

      Description

      Got the following stacktrace during region split.

      2012-04-24 16:05:42,168 WARN org.apache.hadoop.hbase.regionserver.Store: Failed getting store size for value
      java.io.IOException: Requested block is out of range: 2906737606134037404, lastDataBlockOffset: 84764558
      	at org.apache.hadoop.hbase.io.hfile.HFileReaderV2.readBlock(HFileReaderV2.java:278)
      	at org.apache.hadoop.hbase.io.hfile.HFileBlockIndex$BlockIndexReader.midkey(HFileBlockIndex.java:285)
      	at org.apache.hadoop.hbase.io.hfile.HFileReaderV2.midkey(HFileReaderV2.java:402)
      	at org.apache.hadoop.hbase.regionserver.StoreFile$Reader.midkey(StoreFile.java:1638)
      	at org.apache.hadoop.hbase.regionserver.Store.getSplitPoint(Store.java:1943)
      	at org.apache.hadoop.hbase.regionserver.RegionSplitPolicy.getSplitPoint(RegionSplitPolicy.java:77)
      	at org.apache.hadoop.hbase.regionserver.HRegion.checkSplit(HRegion.java:4921)
      	at org.apache.hadoop.hbase.regionserver.HRegionServer.splitRegion(HRegionServer.java:2901)
      
      1. HBASE-5864_3.patch
        11 kB
        ramkrishna.s.vasudevan
      2. HBASE-5864_2.patch
        8 kB
        ramkrishna.s.vasudevan
      3. HBASE-5864_1.patch
        6 kB
        ramkrishna.s.vasudevan
      4. HBASE-5864_test.patch
        2 kB
        ramkrishna.s.vasudevan

        Issue Links

          Activity

          Hide
          ramkrishna.s.vasudevan added a comment -

          The problem is because of using checksums. This problem will be coming based on the block size and the kv size that are added in HFile.
          When we try to read a block, Consider the following code
          In BlockIterator.blockRange

          @Override
                  public HFileBlock nextBlock() throws IOException {
                    if (offset >= endOffset)
                      return null;
                    HFileBlock b = readBlockData(offset, -1, -1, false);
                    offset += b.getOnDiskSizeWithHeader();
                    return b;
                  }
          

          In case of noncompressed algo in

            private HFileBlock readBlockDataInternal(FSDataInputStream is, long offset, 
                  long onDiskSizeWithHeaderL,
                  int uncompressedSize, boolean pread, boolean verifyChecksum) 
          
           b = new HFileBlock(headerBuf, getMinorVersion());
          
                  // This will also allocate enough room for the next block's header.
                  b.allocateBuffer(true);
          
          

          Inside allocateBuffer

          int cksumBytes = totalChecksumBytes();
              int capacityNeeded = headerSize() + uncompressedSizeWithoutHeader +
                  cksumBytes +
                  (extraBytes ? headerSize() : 0);
          
              ByteBuffer newBuf = ByteBuffer.allocate(capacityNeeded);
          

          This byte allocated for this block is having the checksumbytes also added to it.
          After fetching the block and its corresponding stream, we try to get the root block.

          // Data index. We also read statistics about the block index written after
              // the root level.
              dataBlockIndexReader.readMultiLevelIndexRoot(
                  blockIter.nextBlockAsStream(BlockType.ROOT_INDEX),
                  trailer.getDataIndexCount());
          
          

          Inside readMultiLevelIndexRoot

          readRootIndex(in, numEntries);
                if (in.available() < MID_KEY_METADATA_SIZE) {
                  // No mid-key metadata available.
                  return;
                }
          

          Here the InputStream 'in' is formed from the blk.getByteStream().
          Here the inputstream actually will have its available length including the checksum bytes.
          While doing readRootIndex

          public void readRootIndex(DataInput in, final int numEntries)
                  throws IOException {
          

          We read only the data and the remaining checksum bytse allocated are still available with the input stream.
          We know that for every 16k data size we have 4 byte check sum.
          Now if my data size is ~66k i will have 20bytes check sum. So the

             if (in.available() < MID_KEY_METADATA_SIZE) {
                  // No mid-key metadata available.
                  return;
                }
          

          will now be greater than MID_KEY_METADATA_SIZE(16 bytes) and thus giving us some invalid 'midLeafBlockOffset'.
          So when we try to read the file we get some abnormal values. The same can be reproduced using the testcase.

          Show
          ramkrishna.s.vasudevan added a comment - The problem is because of using checksums. This problem will be coming based on the block size and the kv size that are added in HFile. When we try to read a block, Consider the following code In BlockIterator.blockRange @Override public HFileBlock nextBlock() throws IOException { if (offset >= endOffset) return null ; HFileBlock b = readBlockData(offset, -1, -1, false ); offset += b.getOnDiskSizeWithHeader(); return b; } In case of noncompressed algo in private HFileBlock readBlockDataInternal(FSDataInputStream is, long offset, long onDiskSizeWithHeaderL, int uncompressedSize, boolean pread, boolean verifyChecksum) b = new HFileBlock(headerBuf, getMinorVersion()); // This will also allocate enough room for the next block's header. b.allocateBuffer( true ); Inside allocateBuffer int cksumBytes = totalChecksumBytes(); int capacityNeeded = headerSize() + uncompressedSizeWithoutHeader + cksumBytes + (extraBytes ? headerSize() : 0); ByteBuffer newBuf = ByteBuffer.allocate(capacityNeeded); This byte allocated for this block is having the checksumbytes also added to it. After fetching the block and its corresponding stream, we try to get the root block. // Data index. We also read statistics about the block index written after // the root level. dataBlockIndexReader.readMultiLevelIndexRoot( blockIter.nextBlockAsStream(BlockType.ROOT_INDEX), trailer.getDataIndexCount()); Inside readMultiLevelIndexRoot readRootIndex(in, numEntries); if (in.available() < MID_KEY_METADATA_SIZE) { // No mid-key metadata available. return ; } Here the InputStream 'in' is formed from the blk.getByteStream(). Here the inputstream actually will have its available length including the checksum bytes. While doing readRootIndex public void readRootIndex(DataInput in, final int numEntries) throws IOException { We read only the data and the remaining checksum bytse allocated are still available with the input stream. We know that for every 16k data size we have 4 byte check sum. Now if my data size is ~66k i will have 20bytes check sum. So the if (in.available() < MID_KEY_METADATA_SIZE) { // No mid-key metadata available. return ; } will now be greater than MID_KEY_METADATA_SIZE(16 bytes) and thus giving us some invalid 'midLeafBlockOffset'. So when we try to read the file we get some abnormal values. The same can be reproduced using the testcase.
          Hide
          ramkrishna.s.vasudevan added a comment -

          Please correct me if am wrong.
          Am working on a patch, which i can upload for verification. Maynot be the final one.

          Show
          ramkrishna.s.vasudevan added a comment - Please correct me if am wrong. Am working on a patch, which i can upload for verification. Maynot be the final one.
          Hide
          ramkrishna.s.vasudevan added a comment -

          I have attached a sample testcase. Not a regular test case. If you apply the test case and in the logs search for the string
          "The midLeafBlockOffset is ".
          We should be getting

          2012-04-24 20:26:21,140 DEBUG [main] hfile.HFileBlockIndex$BlockIndexReader(553): The midLeafBlockOffset is  7767124539785491489
          
          Show
          ramkrishna.s.vasudevan added a comment - I have attached a sample testcase. Not a regular test case. If you apply the test case and in the logs search for the string "The midLeafBlockOffset is ". We should be getting 2012-04-24 20:26:21,140 DEBUG [main] hfile.HFileBlockIndex$BlockIndexReader(553): The midLeafBlockOffset is 7767124539785491489
          Hide
          stack added a comment -

          Good find lads. I'm not sure I follow. Is it fixable?

          Show
          stack added a comment - Good find lads. I'm not sure I follow. Is it fixable?
          Hide
          ramkrishna.s.vasudevan added a comment -

          The attached patch seems to fix the problem that we encountered. Needs more testing and also with files with old format.
          Will find out if something better is there.

          Show
          ramkrishna.s.vasudevan added a comment - The attached patch seems to fix the problem that we encountered. Needs more testing and also with files with old format. Will find out if something better is there.
          Hide
          stack added a comment -

          I'm not sure I follow what your patch is doing Ram. And maybe we need a test around split of hfile?

          What is this doing:

          -    final int ENTRY_COUNT = 10000;
          +    final int ENTRY_COUNT = 50000;
          

          This is asking for too many entries?

          Good stuff.

          Show
          stack added a comment - I'm not sure I follow what your patch is doing Ram. And maybe we need a test around split of hfile? What is this doing: - final int ENTRY_COUNT = 10000; + final int ENTRY_COUNT = 50000; This is asking for too many entries? Good stuff.
          Hide
          Ted Yu added a comment -
          -    DataInputStream nextBlockAsStream(BlockType blockType) throws IOException;
          +    HFileBlock nextBlockAsStream(BlockType blockType) throws IOException;
          

          The method should be named nextBlock() because stream isn't returned.

          +     * Read in the root-level index from the given input stream. Must match
          

          'input stream' is no longer the input. HFileBlock is.
          Please add @return to the javadoc.
          For TestHFileWriterV2.java:

          -    final Compression.Algorithm COMPRESS_ALGO = Compression.Algorithm.GZ;
          +    final Compression.Algorithm COMPRESS_ALGO = Compression.Algorithm.NONE;
          

          We should exercise both compression algorithms. Refactoring is needed.

          Show
          Ted Yu added a comment - - DataInputStream nextBlockAsStream(BlockType blockType) throws IOException; + HFileBlock nextBlockAsStream(BlockType blockType) throws IOException; The method should be named nextBlock() because stream isn't returned. + * Read in the root-level index from the given input stream. Must match 'input stream' is no longer the input. HFileBlock is. Please add @return to the javadoc. For TestHFileWriterV2.java: - final Compression.Algorithm COMPRESS_ALGO = Compression.Algorithm.GZ; + final Compression.Algorithm COMPRESS_ALGO = Compression.Algorithm.NONE; We should exercise both compression algorithms. Refactoring is needed.
          Hide
          ramkrishna.s.vasudevan added a comment -

          Just added a test case to show how midKey() will throw IOException.
          Stack what the patch does is,
          For the checksum we add some additional bytes while forming the inputstream to read the block.
          After reading the block we just check if the bytes available in stream is less than the MID_KEY_METADATA_SIZE.
          But in this case (50000 is many? but i get only one root level index), there is always some remaining bytes added as part of check sum and it is always more than MID_KEY_METADATA_SIZE.

          The testcase that we carried out to get this problem was,
          Created a table with no split keys.
          Start pumping data to this region using parallel threads.
          Allow a couple of flush/compaction.
          Then try to split the region. We got this problem.
          I may be wrong, pls do correct me and feel free to update the patch also.

          Show
          ramkrishna.s.vasudevan added a comment - Just added a test case to show how midKey() will throw IOException. Stack what the patch does is, For the checksum we add some additional bytes while forming the inputstream to read the block. After reading the block we just check if the bytes available in stream is less than the MID_KEY_METADATA_SIZE. But in this case (50000 is many? but i get only one root level index), there is always some remaining bytes added as part of check sum and it is always more than MID_KEY_METADATA_SIZE. The testcase that we carried out to get this problem was, Created a table with no split keys. Start pumping data to this region using parallel threads. Allow a couple of flush/compaction. Then try to split the region. We got this problem. I may be wrong, pls do correct me and feel free to update the patch also.
          Hide
          ramkrishna.s.vasudevan added a comment -

          @Ted
          Sorry just now saw your comments. Will update based on further reviews. Thanks.

          Show
          ramkrishna.s.vasudevan added a comment - @Ted Sorry just now saw your comments. Will update based on further reviews. Thanks.
          Hide
          Ted Yu added a comment -

          Patch v2 passes the new test.

          +  private void writeDataAndReadFromHFile(Path hfilePath,
          +      Algorithm COMPRESS_ALGO, int ENTRY_COUNT, boolean findMidKey) throws IOException {
          

          Please don't use all upper case parameter names.

          Please refactor the new readRootIndex() to re-use the existing method.

          Show
          Ted Yu added a comment - Patch v2 passes the new test. + private void writeDataAndReadFromHFile(Path hfilePath, + Algorithm COMPRESS_ALGO, int ENTRY_COUNT, boolean findMidKey) throws IOException { Please don't use all upper case parameter names. Please refactor the new readRootIndex() to re-use the existing method.
          Hide
          Ted Yu added a comment -

          The following computation assumes checksum is on:

          +      int numBytes = (int) ChecksumUtil.numBytes(blk
          +          .getOnDiskDataSizeWithHeader(), blk.getBytesPerChecksum());
          

          If checksum is off, we would get 'divide by 0' exception.

          I suggest using HFileBlock.totalChecksumBytes() in place of the above.

          Show
          Ted Yu added a comment - The following computation assumes checksum is on: + int numBytes = ( int ) ChecksumUtil.numBytes(blk + .getOnDiskDataSizeWithHeader(), blk.getBytesPerChecksum()); If checksum is off, we would get 'divide by 0' exception. I suggest using HFileBlock.totalChecksumBytes() in place of the above.
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12524007/HBASE-5864_2.patch
          against trunk revision .

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 3 new or modified tests.

          +1 javadoc. The javadoc tool did not generate any warning messages.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          -1 findbugs. The patch appears to introduce 5 new Findbugs (version 1.3.9) warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          -1 core tests. The patch failed these unit tests:
          org.apache.hadoop.hbase.TestRegionRebalancing
          org.apache.hadoop.hbase.io.hfile.TestForceCacheImportantBlocks

          Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/1627//testReport/
          Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/1627//artifact/trunk/patchprocess/newPatchFindbugsWarnings.html
          Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/1627//console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12524007/HBASE-5864_2.patch against trunk revision . +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 3 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. -1 findbugs. The patch appears to introduce 5 new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed these unit tests: org.apache.hadoop.hbase.TestRegionRebalancing org.apache.hadoop.hbase.io.hfile.TestForceCacheImportantBlocks Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/1627//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/1627//artifact/trunk/patchprocess/newPatchFindbugsWarnings.html Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/1627//console This message is automatically generated.
          Hide
          Lars Hofhansl added a comment -

          Dhruba should have a look this too.

          Show
          Lars Hofhansl added a comment - Dhruba should have a look this too.
          Hide
          ramkrishna.s.vasudevan added a comment -

          Dhruba should have a look this too.

          +1 on Dhruba seeing this.

          Show
          ramkrishna.s.vasudevan added a comment - Dhruba should have a look this too. +1 on Dhruba seeing this.
          Hide
          ramkrishna.s.vasudevan added a comment -

          I suggest using HFileBlock.totalChecksumBytes() in place of the above.

          Yes Ted this should be used. It's scope is default so we can use it.

          Show
          ramkrishna.s.vasudevan added a comment - I suggest using HFileBlock.totalChecksumBytes() in place of the above. Yes Ted this should be used. It's scope is default so we can use it.
          Hide
          Lars Hofhansl added a comment -

          I also don't quite follow the patch or problem.
          How is it that HBase during normal operation (scanning, etc) can read HFiles correctly?

          Show
          Lars Hofhansl added a comment - I also don't quite follow the patch or problem. How is it that HBase during normal operation (scanning, etc) can read HFiles correctly?
          Hide
          ramkrishna.s.vasudevan added a comment -

          @Lars
          In a file that has only one level of index, if we need to find the midkey( during external split operation) we should ideally get the midkey from

          else {
                  // The middle of the root-level index.
                  midKey = blockKeys[(rootCount - 1) / 2];
                }
          

          Because the midLeafBlockOffset should be -1.
          But in this problem that we faced the midLeafBlockOffset is wrongly set with some value and hence the midkey() tries to

          if (midLeafBlockOffset >= 0) {
                  if (cachingBlockReader == null) {
                    throw new IOException("Have to read the middle leaf block but " +
                        "no block reader available");
                  }
          
                  // Caching, using pread, assuming this is not a compaction.
                  HFileBlock midLeafBlock = cachingBlockReader.readBlock(
                      midLeafBlockOffset, midLeafBlockOnDiskSize, true, true, false,
                      BlockType.LEAF_INDEX);
          

          That is where the problem happens as per this bug. For a single level index file there should not be any midLeafBlockOffset. Correct me if am wrong.

          Show
          ramkrishna.s.vasudevan added a comment - @Lars In a file that has only one level of index, if we need to find the midkey( during external split operation) we should ideally get the midkey from else { // The middle of the root-level index. midKey = blockKeys[(rootCount - 1) / 2]; } Because the midLeafBlockOffset should be -1. But in this problem that we faced the midLeafBlockOffset is wrongly set with some value and hence the midkey() tries to if (midLeafBlockOffset >= 0) { if (cachingBlockReader == null ) { throw new IOException( "Have to read the middle leaf block but " + "no block reader available" ); } // Caching, using pread, assuming this is not a compaction. HFileBlock midLeafBlock = cachingBlockReader.readBlock( midLeafBlockOffset, midLeafBlockOnDiskSize, true , true , false , BlockType.LEAF_INDEX); That is where the problem happens as per this bug. For a single level index file there should not be any midLeafBlockOffset. Correct me if am wrong.
          Hide
          Lars Hofhansl added a comment -

          I see. This is a blocker then.

          Show
          Lars Hofhansl added a comment - I see. This is a blocker then.
          Hide
          ramkrishna.s.vasudevan added a comment -

          Updated the patch. All test cases passed. Verified a scenario where i had multiple level indexes and was able to get the correct midkey.
          Please review.

          Show
          ramkrishna.s.vasudevan added a comment - Updated the patch. All test cases passed. Verified a scenario where i had multiple level indexes and was able to get the correct midkey. Please review.
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12524272/HBASE-5864_3.patch
          against trunk revision .

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 6 new or modified tests.

          +1 javadoc. The javadoc tool did not generate any warning messages.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          -1 findbugs. The patch appears to introduce 5 new Findbugs (version 1.3.9) warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          -1 core tests. The patch failed these unit tests:
          org.apache.hadoop.hbase.TestRegionRebalancing

          Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/1644//testReport/
          Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/1644//artifact/trunk/patchprocess/newPatchFindbugsWarnings.html
          Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/1644//console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12524272/HBASE-5864_3.patch against trunk revision . +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 6 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. -1 findbugs. The patch appears to introduce 5 new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed these unit tests: org.apache.hadoop.hbase.TestRegionRebalancing Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/1644//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/1644//artifact/trunk/patchprocess/newPatchFindbugsWarnings.html Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/1644//console This message is automatically generated.
          Hide
          dhruba borthakur added a comment -

          Thanks for finding this ramkrishna. I can see how the bug is occuring, very good analysis and thanks for finding it. I m trying to digest the fix you are providing.

          Show
          dhruba borthakur added a comment - Thanks for finding this ramkrishna. I can see how the bug is occuring, very good analysis and thanks for finding it. I m trying to digest the fix you are providing.
          Hide
          Lars Hofhansl added a comment -

          TestRegionRebalancing is unrelated (HBASE-5848)

          Show
          Lars Hofhansl added a comment - TestRegionRebalancing is unrelated ( HBASE-5848 )
          Hide
          Ted Yu added a comment -

          Latest patch looks good. Minor comments:

          +      // after reading the root index the check sum bytes has to
          

          'check sum bytes has to' -> 'checksum bytes have to'

          +      // be subracted to know if the mid key exists.
          

          'subracted' -> 'subtracted'

          Show
          Ted Yu added a comment - Latest patch looks good. Minor comments: + // after reading the root index the check sum bytes has to 'check sum bytes has to' -> 'checksum bytes have to' + // be subracted to know if the mid key exists. 'subracted' -> 'subtracted'
          Hide
          Lars Hofhansl added a comment -

          This is the only issue in the way of the next RC attempt for 0.94.0.
          I don't feel I can +1 this until a lot of study of the implication.

          Was hoping Dhruba would be able to grok it.

          Show
          Lars Hofhansl added a comment - This is the only issue in the way of the next RC attempt for 0.94.0. I don't feel I can +1 this until a lot of study of the implication. Was hoping Dhruba would be able to grok it.
          Hide
          dhruba borthakur added a comment -

          The meat of the change is in readMultiLevelIndexRoot() in which, instead of using in.available() it uses the in.available() - sizeofchecksum to determine if mid-key metadat is available, now I understand it.
          +1 on the fix.

          Show
          dhruba borthakur added a comment - The meat of the change is in readMultiLevelIndexRoot() in which, instead of using in.available() it uses the in.available() - sizeofchecksum to determine if mid-key metadat is available, now I understand it. +1 on the fix.
          Hide
          dhruba borthakur added a comment -

          I also grepped thru the code to see if there is any other place where readRootIndex().available() is used but could not find any.

          This is a good bug to catch. Ramkrishna: what was the symptom that triggered you to look for this bug?

          Show
          dhruba borthakur added a comment - I also grepped thru the code to see if there is any other place where readRootIndex().available() is used but could not find any. This is a good bug to catch. Ramkrishna: what was the symptom that triggered you to look for this bug?
          Hide
          ramkrishna.s.vasudevan added a comment -

          @Dhruba
          Thanks for looking into the patch

          The testcase that we carried out to get this problem was,

          ->Created a table with no split keys.
          ->Start pumping data to this region using parallel threads.
          -Allow a couple of flush/compaction.
          ->Then try to split the region. It did not split saying midkey offset is not in range.

          Then we tried to reproduce this with different log messages added and found that it is while reading the root level index we get this problem.
          Gopi (who saw this bug) and me spent 2 full days to come up with this.

          Show
          ramkrishna.s.vasudevan added a comment - @Dhruba Thanks for looking into the patch The testcase that we carried out to get this problem was, ->Created a table with no split keys. ->Start pumping data to this region using parallel threads. -Allow a couple of flush/compaction. ->Then try to split the region. It did not split saying midkey offset is not in range. Then we tried to reproduce this with different log messages added and found that it is while reading the root level index we get this problem. Gopi (who saw this bug) and me spent 2 full days to come up with this.
          Hide
          Lars Hofhansl added a comment -

          The meat of the change is in readMultiLevelIndexRoot() in which, instead of using in.available() it uses the in.available() - sizeofchecksum

          So it seems we could have a smaller change that just does that (plus the tests)?
          I agree that this is a great catch!

          Show
          Lars Hofhansl added a comment - The meat of the change is in readMultiLevelIndexRoot() in which, instead of using in.available() it uses the in.available() - sizeofchecksum So it seems we could have a smaller change that just does that (plus the tests)? I agree that this is a great catch!
          Hide
          ramkrishna.s.vasudevan added a comment -

          @Lars
          I still feel the change is very small . Just to get the checkSumBytes instead of getting the stream am now getting the block from which i can get the stream again.

          Show
          ramkrishna.s.vasudevan added a comment - @Lars I still feel the change is very small . Just to get the checkSumBytes instead of getting the stream am now getting the block from which i can get the stream again.
          Hide
          Lars Hofhansl added a comment -

          @Ram: You are right it is a small change.
          Just wondering whether we actually need the part that changes "public DataInputStream nextBlockAsStream(BlockType blockType)" to "public HFileBlock nextBlockWithBlockType(BlockType blockType)".

          Show
          Lars Hofhansl added a comment - @Ram: You are right it is a small change. Just wondering whether we actually need the part that changes "public DataInputStream nextBlockAsStream(BlockType blockType)" to "public HFileBlock nextBlockWithBlockType(BlockType blockType)".
          Hide
          Lars Hofhansl added a comment - - edited

          Ah OK. Never mind, you need the Block to get the checksumBytes.

          Show
          Lars Hofhansl added a comment - - edited Ah OK. Never mind, you need the Block to get the checksumBytes.
          Hide
          Lars Hofhansl added a comment -

          I think I get the change now. +1

          Show
          Lars Hofhansl added a comment - I think I get the change now. +1
          Hide
          Lars Hofhansl added a comment -

          Going to commit in a few unless there're objections.

          Show
          Lars Hofhansl added a comment - Going to commit in a few unless there're objections.
          Hide
          Lars Hofhansl added a comment -

          Committed to 0.94 and 0.96.
          Thanks for the Ram.
          Thanks for reviews Dhruba and Ted.

          Show
          Lars Hofhansl added a comment - Committed to 0.94 and 0.96. Thanks for the Ram. Thanks for reviews Dhruba and Ted.
          Hide
          Hudson added a comment -

          Integrated in HBase-0.94 #151 (See https://builds.apache.org/job/HBase-0.94/151/)
          HBASE-5864 Error while reading from hfile in 0.94 (Ram) (Revision 1331057)

          Result = ABORTED
          larsh :
          Files :

          • /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/io/hfile/HFileBlock.java
          • /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/io/hfile/HFileBlockIndex.java
          • /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/io/hfile/HFileReaderV2.java
          • /hbase/branches/0.94/src/test/java/org/apache/hadoop/hbase/io/hfile/TestHFileBlockIndex.java
          • /hbase/branches/0.94/src/test/java/org/apache/hadoop/hbase/io/hfile/TestHFileWriterV2.java
          Show
          Hudson added a comment - Integrated in HBase-0.94 #151 (See https://builds.apache.org/job/HBase-0.94/151/ ) HBASE-5864 Error while reading from hfile in 0.94 (Ram) (Revision 1331057) Result = ABORTED larsh : Files : /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/io/hfile/HFileBlock.java /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/io/hfile/HFileBlockIndex.java /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/io/hfile/HFileReaderV2.java /hbase/branches/0.94/src/test/java/org/apache/hadoop/hbase/io/hfile/TestHFileBlockIndex.java /hbase/branches/0.94/src/test/java/org/apache/hadoop/hbase/io/hfile/TestHFileWriterV2.java
          Hide
          Hudson added a comment -

          Integrated in HBase-0.94-security #22 (See https://builds.apache.org/job/HBase-0.94-security/22/)
          HBASE-5864 Error while reading from hfile in 0.94 (Ram) (Revision 1331057)

          Result = FAILURE
          larsh :
          Files :

          • /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/io/hfile/HFileBlock.java
          • /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/io/hfile/HFileBlockIndex.java
          • /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/io/hfile/HFileReaderV2.java
          • /hbase/branches/0.94/src/test/java/org/apache/hadoop/hbase/io/hfile/TestHFileBlockIndex.java
          • /hbase/branches/0.94/src/test/java/org/apache/hadoop/hbase/io/hfile/TestHFileWriterV2.java
          Show
          Hudson added a comment - Integrated in HBase-0.94-security #22 (See https://builds.apache.org/job/HBase-0.94-security/22/ ) HBASE-5864 Error while reading from hfile in 0.94 (Ram) (Revision 1331057) Result = FAILURE larsh : Files : /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/io/hfile/HFileBlock.java /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/io/hfile/HFileBlockIndex.java /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/io/hfile/HFileReaderV2.java /hbase/branches/0.94/src/test/java/org/apache/hadoop/hbase/io/hfile/TestHFileBlockIndex.java /hbase/branches/0.94/src/test/java/org/apache/hadoop/hbase/io/hfile/TestHFileWriterV2.java
          Hide
          Hudson added a comment -

          Integrated in HBase-TRUNK #2819 (See https://builds.apache.org/job/HBase-TRUNK/2819/)
          HBASE-5864 Error while reading from hfile in 0.94 (Ram) (Revision 1331058)

          Result = SUCCESS
          larsh :
          Files :

          • /hbase/trunk/src/main/java/org/apache/hadoop/hbase/io/hfile/HFileBlock.java
          • /hbase/trunk/src/main/java/org/apache/hadoop/hbase/io/hfile/HFileBlockIndex.java
          • /hbase/trunk/src/main/java/org/apache/hadoop/hbase/io/hfile/HFileReaderV2.java
          • /hbase/trunk/src/test/java/org/apache/hadoop/hbase/io/hfile/TestHFileBlockIndex.java
          • /hbase/trunk/src/test/java/org/apache/hadoop/hbase/io/hfile/TestHFileWriterV2.java
          Show
          Hudson added a comment - Integrated in HBase-TRUNK #2819 (See https://builds.apache.org/job/HBase-TRUNK/2819/ ) HBASE-5864 Error while reading from hfile in 0.94 (Ram) (Revision 1331058) Result = SUCCESS larsh : Files : /hbase/trunk/src/main/java/org/apache/hadoop/hbase/io/hfile/HFileBlock.java /hbase/trunk/src/main/java/org/apache/hadoop/hbase/io/hfile/HFileBlockIndex.java /hbase/trunk/src/main/java/org/apache/hadoop/hbase/io/hfile/HFileReaderV2.java /hbase/trunk/src/test/java/org/apache/hadoop/hbase/io/hfile/TestHFileBlockIndex.java /hbase/trunk/src/test/java/org/apache/hadoop/hbase/io/hfile/TestHFileWriterV2.java
          Hide
          Hudson added a comment -

          Integrated in HBase-TRUNK-security #186 (See https://builds.apache.org/job/HBase-TRUNK-security/186/)
          HBASE-5864 Error while reading from hfile in 0.94 (Ram) (Revision 1331058)

          Result = SUCCESS
          larsh :
          Files :

          • /hbase/trunk/src/main/java/org/apache/hadoop/hbase/io/hfile/HFileBlock.java
          • /hbase/trunk/src/main/java/org/apache/hadoop/hbase/io/hfile/HFileBlockIndex.java
          • /hbase/trunk/src/main/java/org/apache/hadoop/hbase/io/hfile/HFileReaderV2.java
          • /hbase/trunk/src/test/java/org/apache/hadoop/hbase/io/hfile/TestHFileBlockIndex.java
          • /hbase/trunk/src/test/java/org/apache/hadoop/hbase/io/hfile/TestHFileWriterV2.java
          Show
          Hudson added a comment - Integrated in HBase-TRUNK-security #186 (See https://builds.apache.org/job/HBase-TRUNK-security/186/ ) HBASE-5864 Error while reading from hfile in 0.94 (Ram) (Revision 1331058) Result = SUCCESS larsh : Files : /hbase/trunk/src/main/java/org/apache/hadoop/hbase/io/hfile/HFileBlock.java /hbase/trunk/src/main/java/org/apache/hadoop/hbase/io/hfile/HFileBlockIndex.java /hbase/trunk/src/main/java/org/apache/hadoop/hbase/io/hfile/HFileReaderV2.java /hbase/trunk/src/test/java/org/apache/hadoop/hbase/io/hfile/TestHFileBlockIndex.java /hbase/trunk/src/test/java/org/apache/hadoop/hbase/io/hfile/TestHFileWriterV2.java
          Hide
          Jonathan Hsieh added a comment -

          I haven't looked carefully but do you all think this would affect the 0.92 line as well?

          Show
          Jonathan Hsieh added a comment - I haven't looked carefully but do you all think this would affect the 0.92 line as well?
          Hide
          Lars Hofhansl added a comment -

          @Jon: This is caused by HBASE-5074, which is in 0.94+ only.

          Show
          Lars Hofhansl added a comment - @Jon: This is caused by HBASE-5074 , which is in 0.94+ only.
          Hide
          ramkrishna.s.vasudevan added a comment -

          Let me update the resolved versions as 0.96 also. I was just about to prepare a patch for trunk. Thanks Lars for taking care of it.

          Show
          ramkrishna.s.vasudevan added a comment - Let me update the resolved versions as 0.96 also. I was just about to prepare a patch for trunk. Thanks Lars for taking care of it.

            People

            • Assignee:
              ramkrishna.s.vasudevan
              Reporter:
              Gopinathan A
            • Votes:
              0 Vote for this issue
              Watchers:
              6 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development