HBase
  1. HBase
  2. HBASE-5074

support checksums in HBase block cache

    Details

    • Type: Improvement Improvement
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 0.94.0, 0.95.0
    • Component/s: regionserver
    • Labels:
      None
    • Hadoop Flags:
      Reviewed
    • Release Note:
      Hide
      Adds hbase.regionserver.checksum.verify. If hbase.regionserver.checksum.verify is set to true, then hbase will read data and then verify checksums. Checksum verification inside hdfs will be switched off. If the hbase-checksum verification fails, then it will switch back to using hdfs checksums for verifiying data that is being read from storage. Also adds hbase.hstore.bytes.per.checksum -- number of bytes in a newly created checksum chunk -- and hbase.hstore.checksum.algorithm, name of an algorithm that is used to compute checksums.

      You will currently only see benefit if you have the local read short-circuit enabled -- see http://hbase.apache.org/book.html#perf.hdfs.configs -- while HDFS-3429 goes unfixed.
      Show
      Adds hbase.regionserver.checksum.verify. If hbase.regionserver.checksum.verify is set to true, then hbase will read data and then verify checksums. Checksum verification inside hdfs will be switched off. If the hbase-checksum verification fails, then it will switch back to using hdfs checksums for verifiying data that is being read from storage. Also adds hbase.hstore.bytes.per.checksum -- number of bytes in a newly created checksum chunk -- and hbase.hstore.checksum.algorithm, name of an algorithm that is used to compute checksums. You will currently only see benefit if you have the local read short-circuit enabled -- see http://hbase.apache.org/book.html#perf.hdfs.configs -- while HDFS-3429 goes unfixed.
    • Tags:
      0.96notable

      Description

      The current implementation of HDFS stores the data in one block file and the metadata(checksum) in another block file. This means that every read into the HBase block cache actually consumes two disk iops, one to the datafile and one to the checksum file. This is a major problem for scaling HBase, because HBase is usually bottlenecked on the number of random disk iops that the storage-hardware offers.

      1. 5074-0.94.txt
        214 kB
        Lars Hofhansl
      2. ASF.LICENSE.NOT.GRANTED--D1521.14.patch
        213 kB
        Phabricator
      3. ASF.LICENSE.NOT.GRANTED--D1521.14.patch
        213 kB
        Phabricator
      4. ASF.LICENSE.NOT.GRANTED--D1521.13.patch
        213 kB
        Phabricator
      5. ASF.LICENSE.NOT.GRANTED--D1521.13.patch
        213 kB
        Phabricator
      6. ASF.LICENSE.NOT.GRANTED--D1521.12.patch
        213 kB
        Phabricator
      7. ASF.LICENSE.NOT.GRANTED--D1521.12.patch
        213 kB
        Phabricator
      8. ASF.LICENSE.NOT.GRANTED--D1521.11.patch
        213 kB
        Phabricator
      9. ASF.LICENSE.NOT.GRANTED--D1521.11.patch
        213 kB
        Phabricator
      10. D1521.10.patch
        210 kB
        stack
      11. D1521.10.patch
        210 kB
        stack
      12. D1521.10.patch
        210 kB
        stack
      13. ASF.LICENSE.NOT.GRANTED--D1521.10.patch
        210 kB
        Phabricator
      14. ASF.LICENSE.NOT.GRANTED--D1521.10.patch
        210 kB
        Phabricator
      15. ASF.LICENSE.NOT.GRANTED--D1521.9.patch
        210 kB
        Phabricator
      16. ASF.LICENSE.NOT.GRANTED--D1521.9.patch
        210 kB
        Phabricator
      17. ASF.LICENSE.NOT.GRANTED--D1521.8.patch
        209 kB
        Phabricator
      18. ASF.LICENSE.NOT.GRANTED--D1521.8.patch
        209 kB
        Phabricator
      19. ASF.LICENSE.NOT.GRANTED--D1521.7.patch
        209 kB
        Phabricator
      20. ASF.LICENSE.NOT.GRANTED--D1521.7.patch
        209 kB
        Phabricator
      21. ASF.LICENSE.NOT.GRANTED--D1521.6.patch
        209 kB
        Phabricator
      22. ASF.LICENSE.NOT.GRANTED--D1521.6.patch
        209 kB
        Phabricator
      23. ASF.LICENSE.NOT.GRANTED--D1521.5.patch
        205 kB
        Phabricator
      24. ASF.LICENSE.NOT.GRANTED--D1521.5.patch
        205 kB
        Phabricator
      25. ASF.LICENSE.NOT.GRANTED--D1521.4.patch
        204 kB
        Phabricator
      26. ASF.LICENSE.NOT.GRANTED--D1521.4.patch
        204 kB
        Phabricator
      27. ASF.LICENSE.NOT.GRANTED--D1521.3.patch
        218 kB
        Phabricator
      28. ASF.LICENSE.NOT.GRANTED--D1521.3.patch
        218 kB
        Phabricator
      29. ASF.LICENSE.NOT.GRANTED--D1521.2.patch
        188 kB
        Phabricator
      30. ASF.LICENSE.NOT.GRANTED--D1521.2.patch
        188 kB
        Phabricator
      31. ASF.LICENSE.NOT.GRANTED--D1521.1.patch
        155 kB
        Phabricator
      32. ASF.LICENSE.NOT.GRANTED--D1521.1.patch
        155 kB
        Phabricator

        Issue Links

          Activity

          Hide
          dhruba borthakur added a comment -

          The corresponding HDFS jira is HDFS-2699.

          Another alternative proposal is to store store a checksum in the block header of every hbase block. HBase will make a pread(noChecksumVerify) call to hdfs for random reads. Once the block is read into the hbase cache, it will verify the checksum and if not valid, have to use a new HDFS api to read in contents from another hdfs replica.

          Show
          dhruba borthakur added a comment - The corresponding HDFS jira is HDFS-2699 . Another alternative proposal is to store store a checksum in the block header of every hbase block. HBase will make a pread(noChecksumVerify) call to hdfs for random reads. Once the block is read into the hbase cache, it will verify the checksum and if not valid, have to use a new HDFS api to read in contents from another hdfs replica.
          Hide
          Todd Lipcon added a comment -

          Once the block is read into the hbase cache, it will verify the checksum and if not valid, have to use a new HDFS api to read in contents from another hdfs replica

          Rather than adding a new API to read from another replica, HBase could instead just trigger a second pread from HDFS with the verifyChecksum flag set. This would cause HDFS to notice the checksum error based on its own checksums, and do the "right thing" (ie report the bad replica, fix it up, etc).

          Show
          Todd Lipcon added a comment - Once the block is read into the hbase cache, it will verify the checksum and if not valid, have to use a new HDFS api to read in contents from another hdfs replica Rather than adding a new API to read from another replica, HBase could instead just trigger a second pread from HDFS with the verifyChecksum flag set. This would cause HDFS to notice the checksum error based on its own checksums, and do the "right thing" (ie report the bad replica, fix it up, etc).
          Hide
          dhruba borthakur added a comment -

          Todd: you are right. that would make life easy.

          I am proposing that HBase disk format V3 have a 4 byte checksum for every hbase block. This will not require checksums and data to be stored inline in HDFS while at the same-time allow hbase to do additional iops. One minor disadvantage of this approach is that checksums would be computed twice, once by the hbase regionserver and once by the hdfs client. How bad is this cpu overhead?

          BTW, I got this idea while chatting with Nicolas Spiegelberg. Credits to him for this elegant idea.

          Show
          dhruba borthakur added a comment - Todd: you are right. that would make life easy. I am proposing that HBase disk format V3 have a 4 byte checksum for every hbase block. This will not require checksums and data to be stored inline in HDFS while at the same-time allow hbase to do additional iops. One minor disadvantage of this approach is that checksums would be computed twice, once by the hbase regionserver and once by the hdfs client. How bad is this cpu overhead? BTW, I got this idea while chatting with Nicolas Spiegelberg. Credits to him for this elegant idea.
          Hide
          dhruba borthakur added a comment -

          s/allow hbase to do additional iops/allow hbase to avoid additional iops/g

          Show
          dhruba borthakur added a comment - s/allow hbase to do additional iops/allow hbase to avoid additional iops/g
          Hide
          Todd Lipcon added a comment -

          One minor disadvantage of this approach is that checksums would be computed twice, once by the hbase regionserver and once by the hdfs client. How bad is this cpu overhead?

          You mean on write? The native CRC32C implementation in HDFS trunk right now can do somewhere around 6GB/sec - I clocked it at about 16% overhead compared to the non-checksummed path a while ago. So I think overhead is fairly minimal.

          I am proposing that HBase disk format V3 have a 4 byte checksum for every hbase block

          4 byte checksum for 64KB+ of data seems pretty low. IMO we should continue to do "chunked checksums" - maybe a CRC32 for every 1KB in the block. This allows people to use larger block sizes without compromising checksum effectiveness. The reason to choose chunked CRC32 over a wider hash is that CRC32 has a very efficient hardware implementation in SSE4.2. Plus, we can share all the JNI code already developed for Hadoop to calculate and verify these style of checksums

          Show
          Todd Lipcon added a comment - One minor disadvantage of this approach is that checksums would be computed twice, once by the hbase regionserver and once by the hdfs client. How bad is this cpu overhead? You mean on write? The native CRC32C implementation in HDFS trunk right now can do somewhere around 6GB/sec - I clocked it at about 16% overhead compared to the non-checksummed path a while ago. So I think overhead is fairly minimal. I am proposing that HBase disk format V3 have a 4 byte checksum for every hbase block 4 byte checksum for 64KB+ of data seems pretty low. IMO we should continue to do "chunked checksums" - maybe a CRC32 for every 1KB in the block. This allows people to use larger block sizes without compromising checksum effectiveness. The reason to choose chunked CRC32 over a wider hash is that CRC32 has a very efficient hardware implementation in SSE4.2. Plus, we can share all the JNI code already developed for Hadoop to calculate and verify these style of checksums
          Hide
          Andrew Purtell added a comment -

          +1

          Show
          Andrew Purtell added a comment - +1
          Hide
          Jean-Daniel Cryans added a comment -

          This jira's title make it sound like you want to checksum when reading from the block cache.

          Show
          Jean-Daniel Cryans added a comment - This jira's title make it sound like you want to checksum when reading from the block cache.
          Hide
          stack added a comment -

          Where in the read pipeline would we verify the checksum? Down in hfile? Where would we do the exception processing forcing reread with checksum=on? Also down in hfile?

          (Nice idea BTW)

          Show
          stack added a comment - Where in the read pipeline would we verify the checksum? Down in hfile? Where would we do the exception processing forcing reread with checksum=on? Also down in hfile? (Nice idea BTW)
          Hide
          dhruba borthakur added a comment -

          Yes, the verification of the checksums would happen when the hfile block is loaded into the block cache. it will be entirely in hfile code. also, the exception processing would happen in hfile too.

          Show
          dhruba borthakur added a comment - Yes, the verification of the checksums would happen when the hfile block is loaded into the block cache. it will be entirely in hfile code. also, the exception processing would happen in hfile too.
          Hide
          Phabricator added a comment -

          dhruba requested code review of "[jira] HBASE-5074 Support checksums in HBase block cache".
          Reviewers: mbautin

          HFile is enhanced to store a checksum for each block. HDFS checksum verification is avoided while reading data into the block cache. On a checksum verification failure, we retry the file system read request with hdfs checksums switched on (thanks Todd).

          I have a benchmark that shows that it reduces iops on the disk by about 40%. In this experiment, the entire memory on the regionserver is allocated to the regionserver's jvm and the OS buffer cache size is negligible. I also measured negligible (<5%) additional cpu usage while using hbase-level checksums.

          The salient points of this patch:

          1. Each hfile's trailer used to have a 4 byte version number. I enhanced this so that these 4 bytes can be interpreted as a (major version number, minor version). Pre-existing hfiles have a minor version of 0. The new hfile format has a minor version of 1 (thanks Mikhail). The hfile major version remains unchanged at 2. The reason I did not introduce a new major version number is because the code changes needed to store/read checksums do not differ much from existing V2 writers/readers.

          2. Introduced a HFileSystem object which is a encapsulates the FileSystem objects needed to access data from hfiles and hlogs. HDFS FileSystem objects already had the ability to switch off checksum verifications for reads.

          3. The majority of the code changes are located in hbase.io.hfie package. The retry of a read on an initial checksum failure occurs inside the hbase.io.hfile package itself. The code changes to hbase.regionserver package are minor.

          4. The format of a hfileblock is the header followed by the data followed by the checksum(s). Each 16 K (configurable) size of data has a 4 byte checksum. The hfileblock header has two additional fields: a 4 byte value to store the bytesPerChecksum and a 4 byte value to store the size of the user data (excluding the checksum data). This is well explained in the associated javadocs.

          5. I added a test to test backward compatibility. I will be writing more unit tests that triggers checksum verification failures aggressively. I have left a few redundant log messages in the code (just for easier debugging) and will remove them in later stage of this patch. I will also be adding metrics on number of checksum verification failures/success in a later version of this diff.

          6. By default, hbase-level checksums are switched on and hdfs level checksums are switched off for hfile-reads. No changes to Hlog code path here.

          TEST PLAN
          The default setting is to switch on hbase checksums for hfile-reads, thus all existing tests actually validate the new code pieces. I will be writing more unit tests for triggering checksum verification failures.

          REVISION DETAIL
          https://reviews.facebook.net/D1521

          AFFECTED FILES
          src/test/java/org/apache/hadoop/hbase/HBaseTestingUtility.java
          src/test/java/org/apache/hadoop/hbase/HBaseTestCase.java
          src/test/java/org/apache/hadoop/hbase/regionserver/TestSplitTransaction.java
          src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestWALReplay.java
          src/test/java/org/apache/hadoop/hbase/regionserver/TestScanWithBloomError.java
          src/test/java/org/apache/hadoop/hbase/regionserver/TestCompoundBloomFilter.java
          src/test/java/org/apache/hadoop/hbase/regionserver/TestStore.java
          src/test/java/org/apache/hadoop/hbase/regionserver/TestStoreFile.java
          src/test/java/org/apache/hadoop/hbase/io/hfile/TestHFileBlock.java
          src/test/java/org/apache/hadoop/hbase/io/hfile/TestCacheOnWrite.java
          src/test/java/org/apache/hadoop/hbase/io/hfile/TestHFileReaderV1.java
          src/test/java/org/apache/hadoop/hbase/io/hfile/CacheTestUtils.java
          src/test/java/org/apache/hadoop/hbase/io/hfile/TestHFileBlockIndex.java
          src/test/java/org/apache/hadoop/hbase/io/hfile/TestHFileDataBlockEncoder.java
          src/test/java/org/apache/hadoop/hbase/io/hfile/TestHFileBlockCompatibility.java
          src/test/java/org/apache/hadoop/hbase/io/TestHalfStoreFileReader.java
          src/test/java/org/apache/hadoop/hbase/util/TestMergeTable.java
          src/main/java/org/apache/hadoop/hbase/HConstants.java
          src/main/java/org/apache/hadoop/hbase/util/HFileSystem.java
          src/main/java/org/apache/hadoop/hbase/util/ChecksumByteArrayOutputStream.java
          src/main/java/org/apache/hadoop/hbase/util/CompoundBloomFilter.java
          src/main/java/org/apache/hadoop/hbase/util/ChecksumFactory.java
          src/main/java/org/apache/hadoop/hbase/regionserver/handler/OpenRegionHandler.java
          src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java
          src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServer.java
          src/main/java/org/apache/hadoop/hbase/io/hfile/HFileBlock.java
          src/main/java/org/apache/hadoop/hbase/io/hfile/HFilePrettyPrinter.java
          src/main/java/org/apache/hadoop/hbase/io/hfile/HFileDataBlockEncoderImpl.java
          src/main/java/org/apache/hadoop/hbase/io/hfile/HFileReaderV2.java
          src/main/java/org/apache/hadoop/hbase/io/hfile/HFile.java
          src/main/java/org/apache/hadoop/hbase/io/hfile/HFileWriterV2.java
          src/main/java/org/apache/hadoop/hbase/io/hfile/HFileDataBlockEncoder.java
          src/main/java/org/apache/hadoop/hbase/io/hfile/AbstractHFileReader.java
          src/main/java/org/apache/hadoop/hbase/io/hfile/NoOpDataBlockEncoder.java
          src/main/java/org/apache/hadoop/hbase/io/hfile/FixedFileTrailer.java
          src/main/java/org/apache/hadoop/hbase/io/hfile/HFileWriterV1.java

          MANAGE HERALD DIFFERENTIAL RULES
          https://reviews.facebook.net/herald/view/differential/

          WHY DID I GET THIS EMAIL?
          https://reviews.facebook.net/herald/transcript/3171/

          Tip: use the X-Herald-Rules header to filter Herald messages in your client.

          Show
          Phabricator added a comment - dhruba requested code review of " [jira] HBASE-5074 Support checksums in HBase block cache". Reviewers: mbautin HFile is enhanced to store a checksum for each block. HDFS checksum verification is avoided while reading data into the block cache. On a checksum verification failure, we retry the file system read request with hdfs checksums switched on (thanks Todd). I have a benchmark that shows that it reduces iops on the disk by about 40%. In this experiment, the entire memory on the regionserver is allocated to the regionserver's jvm and the OS buffer cache size is negligible. I also measured negligible (<5%) additional cpu usage while using hbase-level checksums. The salient points of this patch: 1. Each hfile's trailer used to have a 4 byte version number. I enhanced this so that these 4 bytes can be interpreted as a (major version number, minor version). Pre-existing hfiles have a minor version of 0. The new hfile format has a minor version of 1 (thanks Mikhail). The hfile major version remains unchanged at 2. The reason I did not introduce a new major version number is because the code changes needed to store/read checksums do not differ much from existing V2 writers/readers. 2. Introduced a HFileSystem object which is a encapsulates the FileSystem objects needed to access data from hfiles and hlogs. HDFS FileSystem objects already had the ability to switch off checksum verifications for reads. 3. The majority of the code changes are located in hbase.io.hfie package. The retry of a read on an initial checksum failure occurs inside the hbase.io.hfile package itself. The code changes to hbase.regionserver package are minor. 4. The format of a hfileblock is the header followed by the data followed by the checksum(s). Each 16 K (configurable) size of data has a 4 byte checksum. The hfileblock header has two additional fields: a 4 byte value to store the bytesPerChecksum and a 4 byte value to store the size of the user data (excluding the checksum data). This is well explained in the associated javadocs. 5. I added a test to test backward compatibility. I will be writing more unit tests that triggers checksum verification failures aggressively. I have left a few redundant log messages in the code (just for easier debugging) and will remove them in later stage of this patch. I will also be adding metrics on number of checksum verification failures/success in a later version of this diff. 6. By default, hbase-level checksums are switched on and hdfs level checksums are switched off for hfile-reads. No changes to Hlog code path here. TEST PLAN The default setting is to switch on hbase checksums for hfile-reads, thus all existing tests actually validate the new code pieces. I will be writing more unit tests for triggering checksum verification failures. REVISION DETAIL https://reviews.facebook.net/D1521 AFFECTED FILES src/test/java/org/apache/hadoop/hbase/HBaseTestingUtility.java src/test/java/org/apache/hadoop/hbase/HBaseTestCase.java src/test/java/org/apache/hadoop/hbase/regionserver/TestSplitTransaction.java src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestWALReplay.java src/test/java/org/apache/hadoop/hbase/regionserver/TestScanWithBloomError.java src/test/java/org/apache/hadoop/hbase/regionserver/TestCompoundBloomFilter.java src/test/java/org/apache/hadoop/hbase/regionserver/TestStore.java src/test/java/org/apache/hadoop/hbase/regionserver/TestStoreFile.java src/test/java/org/apache/hadoop/hbase/io/hfile/TestHFileBlock.java src/test/java/org/apache/hadoop/hbase/io/hfile/TestCacheOnWrite.java src/test/java/org/apache/hadoop/hbase/io/hfile/TestHFileReaderV1.java src/test/java/org/apache/hadoop/hbase/io/hfile/CacheTestUtils.java src/test/java/org/apache/hadoop/hbase/io/hfile/TestHFileBlockIndex.java src/test/java/org/apache/hadoop/hbase/io/hfile/TestHFileDataBlockEncoder.java src/test/java/org/apache/hadoop/hbase/io/hfile/TestHFileBlockCompatibility.java src/test/java/org/apache/hadoop/hbase/io/TestHalfStoreFileReader.java src/test/java/org/apache/hadoop/hbase/util/TestMergeTable.java src/main/java/org/apache/hadoop/hbase/HConstants.java src/main/java/org/apache/hadoop/hbase/util/HFileSystem.java src/main/java/org/apache/hadoop/hbase/util/ChecksumByteArrayOutputStream.java src/main/java/org/apache/hadoop/hbase/util/CompoundBloomFilter.java src/main/java/org/apache/hadoop/hbase/util/ChecksumFactory.java src/main/java/org/apache/hadoop/hbase/regionserver/handler/OpenRegionHandler.java src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServer.java src/main/java/org/apache/hadoop/hbase/io/hfile/HFileBlock.java src/main/java/org/apache/hadoop/hbase/io/hfile/HFilePrettyPrinter.java src/main/java/org/apache/hadoop/hbase/io/hfile/HFileDataBlockEncoderImpl.java src/main/java/org/apache/hadoop/hbase/io/hfile/HFileReaderV2.java src/main/java/org/apache/hadoop/hbase/io/hfile/HFile.java src/main/java/org/apache/hadoop/hbase/io/hfile/HFileWriterV2.java src/main/java/org/apache/hadoop/hbase/io/hfile/HFileDataBlockEncoder.java src/main/java/org/apache/hadoop/hbase/io/hfile/AbstractHFileReader.java src/main/java/org/apache/hadoop/hbase/io/hfile/NoOpDataBlockEncoder.java src/main/java/org/apache/hadoop/hbase/io/hfile/FixedFileTrailer.java src/main/java/org/apache/hadoop/hbase/io/hfile/HFileWriterV1.java MANAGE HERALD DIFFERENTIAL RULES https://reviews.facebook.net/herald/view/differential/ WHY DID I GET THIS EMAIL? https://reviews.facebook.net/herald/transcript/3171/ Tip: use the X-Herald-Rules header to filter Herald messages in your client.
          Hide
          Phabricator added a comment -

          dhruba requested code review of "[jira] HBASE-5074 Support checksums in HBase block cache".
          Reviewers: mbautin

          HFile is enhanced to store a checksum for each block. HDFS checksum verification is avoided while reading data into the block cache. On a checksum verification failure, we retry the file system read request with hdfs checksums switched on (thanks Todd).

          I have a benchmark that shows that it reduces iops on the disk by about 40%. In this experiment, the entire memory on the regionserver is allocated to the regionserver's jvm and the OS buffer cache size is negligible. I also measured negligible (<5%) additional cpu usage while using hbase-level checksums.

          The salient points of this patch:

          1. Each hfile's trailer used to have a 4 byte version number. I enhanced this so that these 4 bytes can be interpreted as a (major version number, minor version). Pre-existing hfiles have a minor version of 0. The new hfile format has a minor version of 1 (thanks Mikhail). The hfile major version remains unchanged at 2. The reason I did not introduce a new major version number is because the code changes needed to store/read checksums do not differ much from existing V2 writers/readers.

          2. Introduced a HFileSystem object which is a encapsulates the FileSystem objects needed to access data from hfiles and hlogs. HDFS FileSystem objects already had the ability to switch off checksum verifications for reads.

          3. The majority of the code changes are located in hbase.io.hfie package. The retry of a read on an initial checksum failure occurs inside the hbase.io.hfile package itself. The code changes to hbase.regionserver package are minor.

          4. The format of a hfileblock is the header followed by the data followed by the checksum(s). Each 16 K (configurable) size of data has a 4 byte checksum. The hfileblock header has two additional fields: a 4 byte value to store the bytesPerChecksum and a 4 byte value to store the size of the user data (excluding the checksum data). This is well explained in the associated javadocs.

          5. I added a test to test backward compatibility. I will be writing more unit tests that triggers checksum verification failures aggressively. I have left a few redundant log messages in the code (just for easier debugging) and will remove them in later stage of this patch. I will also be adding metrics on number of checksum verification failures/success in a later version of this diff.

          6. By default, hbase-level checksums are switched on and hdfs level checksums are switched off for hfile-reads. No changes to Hlog code path here.

          TEST PLAN
          The default setting is to switch on hbase checksums for hfile-reads, thus all existing tests actually validate the new code pieces. I will be writing more unit tests for triggering checksum verification failures.

          REVISION DETAIL
          https://reviews.facebook.net/D1521

          AFFECTED FILES
          src/test/java/org/apache/hadoop/hbase/HBaseTestingUtility.java
          src/test/java/org/apache/hadoop/hbase/HBaseTestCase.java
          src/test/java/org/apache/hadoop/hbase/regionserver/TestSplitTransaction.java
          src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestWALReplay.java
          src/test/java/org/apache/hadoop/hbase/regionserver/TestScanWithBloomError.java
          src/test/java/org/apache/hadoop/hbase/regionserver/TestCompoundBloomFilter.java
          src/test/java/org/apache/hadoop/hbase/regionserver/TestStore.java
          src/test/java/org/apache/hadoop/hbase/regionserver/TestStoreFile.java
          src/test/java/org/apache/hadoop/hbase/io/hfile/TestHFileBlock.java
          src/test/java/org/apache/hadoop/hbase/io/hfile/TestCacheOnWrite.java
          src/test/java/org/apache/hadoop/hbase/io/hfile/TestHFileReaderV1.java
          src/test/java/org/apache/hadoop/hbase/io/hfile/CacheTestUtils.java
          src/test/java/org/apache/hadoop/hbase/io/hfile/TestHFileBlockIndex.java
          src/test/java/org/apache/hadoop/hbase/io/hfile/TestHFileDataBlockEncoder.java
          src/test/java/org/apache/hadoop/hbase/io/hfile/TestHFileBlockCompatibility.java
          src/test/java/org/apache/hadoop/hbase/io/TestHalfStoreFileReader.java
          src/test/java/org/apache/hadoop/hbase/util/TestMergeTable.java
          src/main/java/org/apache/hadoop/hbase/HConstants.java
          src/main/java/org/apache/hadoop/hbase/util/HFileSystem.java
          src/main/java/org/apache/hadoop/hbase/util/ChecksumByteArrayOutputStream.java
          src/main/java/org/apache/hadoop/hbase/util/CompoundBloomFilter.java
          src/main/java/org/apache/hadoop/hbase/util/ChecksumFactory.java
          src/main/java/org/apache/hadoop/hbase/regionserver/handler/OpenRegionHandler.java
          src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java
          src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServer.java
          src/main/java/org/apache/hadoop/hbase/io/hfile/HFileBlock.java
          src/main/java/org/apache/hadoop/hbase/io/hfile/HFilePrettyPrinter.java
          src/main/java/org/apache/hadoop/hbase/io/hfile/HFileDataBlockEncoderImpl.java
          src/main/java/org/apache/hadoop/hbase/io/hfile/HFileReaderV2.java
          src/main/java/org/apache/hadoop/hbase/io/hfile/HFile.java
          src/main/java/org/apache/hadoop/hbase/io/hfile/HFileWriterV2.java
          src/main/java/org/apache/hadoop/hbase/io/hfile/HFileDataBlockEncoder.java
          src/main/java/org/apache/hadoop/hbase/io/hfile/AbstractHFileReader.java
          src/main/java/org/apache/hadoop/hbase/io/hfile/NoOpDataBlockEncoder.java
          src/main/java/org/apache/hadoop/hbase/io/hfile/FixedFileTrailer.java
          src/main/java/org/apache/hadoop/hbase/io/hfile/HFileWriterV1.java

          MANAGE HERALD DIFFERENTIAL RULES
          https://reviews.facebook.net/herald/view/differential/

          WHY DID I GET THIS EMAIL?
          https://reviews.facebook.net/herald/transcript/3171/

          Tip: use the X-Herald-Rules header to filter Herald messages in your client.

          Show
          Phabricator added a comment - dhruba requested code review of " [jira] HBASE-5074 Support checksums in HBase block cache". Reviewers: mbautin HFile is enhanced to store a checksum for each block. HDFS checksum verification is avoided while reading data into the block cache. On a checksum verification failure, we retry the file system read request with hdfs checksums switched on (thanks Todd). I have a benchmark that shows that it reduces iops on the disk by about 40%. In this experiment, the entire memory on the regionserver is allocated to the regionserver's jvm and the OS buffer cache size is negligible. I also measured negligible (<5%) additional cpu usage while using hbase-level checksums. The salient points of this patch: 1. Each hfile's trailer used to have a 4 byte version number. I enhanced this so that these 4 bytes can be interpreted as a (major version number, minor version). Pre-existing hfiles have a minor version of 0. The new hfile format has a minor version of 1 (thanks Mikhail). The hfile major version remains unchanged at 2. The reason I did not introduce a new major version number is because the code changes needed to store/read checksums do not differ much from existing V2 writers/readers. 2. Introduced a HFileSystem object which is a encapsulates the FileSystem objects needed to access data from hfiles and hlogs. HDFS FileSystem objects already had the ability to switch off checksum verifications for reads. 3. The majority of the code changes are located in hbase.io.hfie package. The retry of a read on an initial checksum failure occurs inside the hbase.io.hfile package itself. The code changes to hbase.regionserver package are minor. 4. The format of a hfileblock is the header followed by the data followed by the checksum(s). Each 16 K (configurable) size of data has a 4 byte checksum. The hfileblock header has two additional fields: a 4 byte value to store the bytesPerChecksum and a 4 byte value to store the size of the user data (excluding the checksum data). This is well explained in the associated javadocs. 5. I added a test to test backward compatibility. I will be writing more unit tests that triggers checksum verification failures aggressively. I have left a few redundant log messages in the code (just for easier debugging) and will remove them in later stage of this patch. I will also be adding metrics on number of checksum verification failures/success in a later version of this diff. 6. By default, hbase-level checksums are switched on and hdfs level checksums are switched off for hfile-reads. No changes to Hlog code path here. TEST PLAN The default setting is to switch on hbase checksums for hfile-reads, thus all existing tests actually validate the new code pieces. I will be writing more unit tests for triggering checksum verification failures. REVISION DETAIL https://reviews.facebook.net/D1521 AFFECTED FILES src/test/java/org/apache/hadoop/hbase/HBaseTestingUtility.java src/test/java/org/apache/hadoop/hbase/HBaseTestCase.java src/test/java/org/apache/hadoop/hbase/regionserver/TestSplitTransaction.java src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestWALReplay.java src/test/java/org/apache/hadoop/hbase/regionserver/TestScanWithBloomError.java src/test/java/org/apache/hadoop/hbase/regionserver/TestCompoundBloomFilter.java src/test/java/org/apache/hadoop/hbase/regionserver/TestStore.java src/test/java/org/apache/hadoop/hbase/regionserver/TestStoreFile.java src/test/java/org/apache/hadoop/hbase/io/hfile/TestHFileBlock.java src/test/java/org/apache/hadoop/hbase/io/hfile/TestCacheOnWrite.java src/test/java/org/apache/hadoop/hbase/io/hfile/TestHFileReaderV1.java src/test/java/org/apache/hadoop/hbase/io/hfile/CacheTestUtils.java src/test/java/org/apache/hadoop/hbase/io/hfile/TestHFileBlockIndex.java src/test/java/org/apache/hadoop/hbase/io/hfile/TestHFileDataBlockEncoder.java src/test/java/org/apache/hadoop/hbase/io/hfile/TestHFileBlockCompatibility.java src/test/java/org/apache/hadoop/hbase/io/TestHalfStoreFileReader.java src/test/java/org/apache/hadoop/hbase/util/TestMergeTable.java src/main/java/org/apache/hadoop/hbase/HConstants.java src/main/java/org/apache/hadoop/hbase/util/HFileSystem.java src/main/java/org/apache/hadoop/hbase/util/ChecksumByteArrayOutputStream.java src/main/java/org/apache/hadoop/hbase/util/CompoundBloomFilter.java src/main/java/org/apache/hadoop/hbase/util/ChecksumFactory.java src/main/java/org/apache/hadoop/hbase/regionserver/handler/OpenRegionHandler.java src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServer.java src/main/java/org/apache/hadoop/hbase/io/hfile/HFileBlock.java src/main/java/org/apache/hadoop/hbase/io/hfile/HFilePrettyPrinter.java src/main/java/org/apache/hadoop/hbase/io/hfile/HFileDataBlockEncoderImpl.java src/main/java/org/apache/hadoop/hbase/io/hfile/HFileReaderV2.java src/main/java/org/apache/hadoop/hbase/io/hfile/HFile.java src/main/java/org/apache/hadoop/hbase/io/hfile/HFileWriterV2.java src/main/java/org/apache/hadoop/hbase/io/hfile/HFileDataBlockEncoder.java src/main/java/org/apache/hadoop/hbase/io/hfile/AbstractHFileReader.java src/main/java/org/apache/hadoop/hbase/io/hfile/NoOpDataBlockEncoder.java src/main/java/org/apache/hadoop/hbase/io/hfile/FixedFileTrailer.java src/main/java/org/apache/hadoop/hbase/io/hfile/HFileWriterV1.java MANAGE HERALD DIFFERENTIAL RULES https://reviews.facebook.net/herald/view/differential/ WHY DID I GET THIS EMAIL? https://reviews.facebook.net/herald/transcript/3171/ Tip: use the X-Herald-Rules header to filter Herald messages in your client.
          Hide
          Phabricator added a comment -

          tedyu has commented on the revision "[jira] HBASE-5074 Support checksums in HBase block cache".

          Good job, Dhruba.

          I like this comment from HFileSystem:

          + * In future, if we want to make hlogs be in a different filesystem,
          + * this is the place to make it happen.

          I only see one setVerifyChecksum() call in the HFileSystem ctor.
          The readfs is used by createReaderWithEncoding().

          Shall we give readfs more flexibility where checksum verification can be configured dynamically ?

          REVISION DETAIL
          https://reviews.facebook.net/D1521

          Show
          Phabricator added a comment - tedyu has commented on the revision " [jira] HBASE-5074 Support checksums in HBase block cache". Good job, Dhruba. I like this comment from HFileSystem: + * In future, if we want to make hlogs be in a different filesystem, + * this is the place to make it happen. I only see one setVerifyChecksum() call in the HFileSystem ctor. The readfs is used by createReaderWithEncoding(). Shall we give readfs more flexibility where checksum verification can be configured dynamically ? REVISION DETAIL https://reviews.facebook.net/D1521
          Hide
          Ted Yu added a comment -

          In HFileBlock.readBlockData(), once useHBaseChecksum is set to false, I don't see where it would be set to true again.
          At line 1613, if HDFS checksum verification was able to correct the problem,
          1. should we re-enable useHBaseChecksum ?
          2. I think we should log the fact that HDFS checksum verification worked.

               * @param pread whether to use HBase checksums. If HBase checksum is
               *          switched off, then use HDFS checksum.
          

          Please correct the name of parameter above.

          Show
          Ted Yu added a comment - In HFileBlock.readBlockData(), once useHBaseChecksum is set to false, I don't see where it would be set to true again. At line 1613, if HDFS checksum verification was able to correct the problem, 1. should we re-enable useHBaseChecksum ? 2. I think we should log the fact that HDFS checksum verification worked. * @param pread whether to use HBase checksums. If HBase checksum is * switched off, then use HDFS checksum. Please correct the name of parameter above.
          Hide
          dhruba borthakur added a comment -

          Thanks Ted/Zhihong for the review comments.

          ted: The thinking is that HFileSystem.readfs should be used only by StoreFiles for reading hfile. That is the reason that this code path uses readfs. This is the only place we want to avoid using hdfs checksums. All other code paths are unchanged.

          Zhihong: If hbase checksum validation fails once, I switch back to using hdfs-level checksums for that instance of the Reader. For each block that have hbase-checksums mismatch we retry the operation, thus it actually doubles the iops. I was trying to avoid the scenario where most hbase level checksums fail, and each io is retried twice. But if people feel otherwise, I can set the useHBaseChecksum after a few successful ios.

          I will log the fact the hdfs checksum verification worked and will also add metrics counters to record these events (next version of the patch).

          Show
          dhruba borthakur added a comment - Thanks Ted/Zhihong for the review comments. ted: The thinking is that HFileSystem.readfs should be used only by StoreFiles for reading hfile. That is the reason that this code path uses readfs. This is the only place we want to avoid using hdfs checksums. All other code paths are unchanged. Zhihong: If hbase checksum validation fails once, I switch back to using hdfs-level checksums for that instance of the Reader. For each block that have hbase-checksums mismatch we retry the operation, thus it actually doubles the iops. I was trying to avoid the scenario where most hbase level checksums fail, and each io is retried twice. But if people feel otherwise, I can set the useHBaseChecksum after a few successful ios. I will log the fact the hdfs checksum verification worked and will also add metrics counters to record these events (next version of the patch).
          Hide
          dhruba borthakur added a comment -

          This patch is not yet ready for submission. It needs enhancement with a unit test and metrics collection.

          Show
          dhruba borthakur added a comment - This patch is not yet ready for submission. It needs enhancement with a unit test and metrics collection.
          Hide
          Phabricator added a comment -

          mbautin has commented on the revision "[jira] HBASE-5074 Support checksums in HBase block cache".
          Added CCs: dhruba

          @Dhruba: The "checksum at the end of block" approach seems reasonable and the implementation looks good! Specific comments inline.

          INLINE COMMENTS
          src/main/java/org/apache/hadoop/hbase/io/hfile/AbstractHFileReader.java:357 What is the purpose of the hfs parameter here?
          src/main/java/org/apache/hadoop/hbase/io/hfile/FixedFileTrailer.java:49 s/preceeding/preceding/
          src/main/java/org/apache/hadoop/hbase/io/hfile/FixedFileTrailer.java:50 s/deermines/determines/
          src/main/java/org/apache/hadoop/hbase/io/hfile/FixedFileTrailer.java:51 s/does not need/do not need/
          src/main/java/org/apache/hadoop/hbase/io/hfile/FixedFileTrailer.java:119 s/major/minor/
          src/main/java/org/apache/hadoop/hbase/io/hfile/FixedFileTrailer.java:260 Rename the existing expectVersion to expectMajorVersion for clarity.
          src/main/java/org/apache/hadoop/hbase/io/hfile/FixedFileTrailer.java:343 Rename to expectMajorVersion for clarity.

          Also, does the version field of this class now only contain the major version? If so, rename it to majorVersion.
          src/main/java/org/apache/hadoop/hbase/io/hfile/FixedFileTrailer.java:345 Add the word "major" to the error message.
          src/main/java/org/apache/hadoop/hbase/io/hfile/FixedFileTrailer.java:462 Rename to getMajorVersion
          src/main/java/org/apache/hadoop/hbase/io/hfile/HFile.java:402 Can we modify the parameter type and get rid of this cast?
          src/main/java/org/apache/hadoop/hbase/io/hfile/HFile.java:415 This is not a constructor, but a factory method.
          src/main/java/org/apache/hadoop/hbase/io/hfile/HFile.java:417 Add "ForTest" to method name for clarity.
          src/main/java/org/apache/hadoop/hbase/io/hfile/HFileBlock.java:91 s/has/have/
          src/main/java/org/apache/hadoop/hbase/io/hfile/HFileBlock.java:95 Consider replacing the "_V0" suffix with something more meaningful like "_NO_CHECKSUM".
          src/main/java/org/apache/hadoop/hbase/io/hfile/HFileBlock.java:102 Consider using a suffix "_WITH_CHECKSUM".
          src/main/java/org/apache/hadoop/hbase/io/hfile/HFileBlock.java:123 When the number of bytes per checksum becomes configurable, will that require a persistent data format change? What will the upgrade procedure be in that case?
          src/main/java/org/apache/hadoop/hbase/io/hfile/HFileBlock.java:138 It is not clear from this call that 0 is minor version. Create a constant with a meaningful name (e.g. MINOR_VERSION_NO_CHECKSUM).
          src/main/java/org/apache/hadoop/hbase/io/hfile/HFileBlock.java:149 Consider adding "WithChecksum" to variable name.
          src/main/java/org/apache/hadoop/hbase/io/hfile/HFileBlock.java:395-398 This is becoming bulky. Factor out the common term (uncompressedSizeWithoutHeader + headerSize() + cksumBytes) into a local variable. Also avoid evaluating headerSize() twice.
          src/main/java/org/apache/hadoop/hbase/io/hfile/HFileBlock.java:400-402 Reuse the new local variables from the above comments here.
          src/main/java/org/apache/hadoop/hbase/io/hfile/HFileBlock.java:441-442 Update this comment, since the meaning of "extraBytes" has changed from just being the room for the next block's header to a much more complex role.
          src/main/java/org/apache/hadoop/hbase/io/hfile/HFileBlock.java:757-758 Should we throw an IOException instead since this method already throws it?
          src/main/java/org/apache/hadoop/hbase/io/hfile/HFileBlock.java:771-772 tmp is a particularly bad variable name. Combine these two lines and get rid of tmp.
          src/main/java/org/apache/hadoop/hbase/io/hfile/HFileBlock.java:808-809 Get rid of tmp and combine these two lines.
          src/main/java/org/apache/hadoop/hbase/io/hfile/HFileBlock.java:788 This method and the above one seem to share a lot of code. Is it possible to get rid of code duplication?

          Also, these two methods seem isolated enough to be moved to another class, maybe even as static methods.
          src/main/java/org/apache/hadoop/hbase/io/hfile/HFileBlock.java:866-870 Do we need this in case of minorVersion = 0? Or do we always write new files with checksums?
          src/main/java/org/apache/hadoop/hbase/io/hfile/HFileBlock.java:973-975 Somehow the fact that checksum format is different for compressed and uncompressed blocks has escaped me halfway through the review. Maybe it is worth explicitly mentioning that in javadoc.
          src/main/java/org/apache/hadoop/hbase/io/hfile/HFileBlock.java:999-1011 Use System.arraycopy instead of loops.

          Add "ForTest" to method name to discourage its use in production.
          src/main/java/org/apache/hadoop/hbase/io/hfile/HFileBlock.java:1202 Nice! Thanks for locking down these internal base classes and methods.
          src/main/java/org/apache/hadoop/hbase/io/hfile/HFileBlock.java:1389-1390 Delete one of these lines.
          src/main/java/org/apache/hadoop/hbase/io/hfile/HFileBlock.java:1520-1527 Does it make sense to move this checksum instantiation code to a function and reuse it everywhere we call ChecksumFactory.newInstance()?
          src/main/java/org/apache/hadoop/hbase/io/hfile/HFileBlock.java:1786 Remove this and other debug output statements.
          src/main/java/org/apache/hadoop/hbase/io/hfile/HFileBlock.java:1871 As I mentioned, it is probably better to move checksum computation and validation code to a separate utility class.
          src/main/java/org/apache/hadoop/hbase/io/hfile/HFileWriterV1.java:220 Use a constant to indicate that this is a minor version without checksum support instead of just 0.
          src/main/java/org/apache/hadoop/hbase/util/HFileSystem.java:56 Is this necessary? Does not Java call default constructors automatically?
          src/test/java/org/apache/hadoop/hbase/io/hfile/TestHFileBlock.java:70 This is for testing only, right?
          src/test/java/org/apache/hadoop/hbase/io/hfile/TestHFileBlock.java:225 Long line.
          src/test/java/org/apache/hadoop/hbase/io/hfile/TestHFileBlockCompatibility.java:1 A lot in this file appears to be copy-paste from TestHFileBlock, so it very difficult to see the real changes. Please reuse the appropriate code instead.

          REVISION DETAIL
          https://reviews.facebook.net/D1521

          Show
          Phabricator added a comment - mbautin has commented on the revision " [jira] HBASE-5074 Support checksums in HBase block cache". Added CCs: dhruba @Dhruba: The "checksum at the end of block" approach seems reasonable and the implementation looks good! Specific comments inline. INLINE COMMENTS src/main/java/org/apache/hadoop/hbase/io/hfile/AbstractHFileReader.java:357 What is the purpose of the hfs parameter here? src/main/java/org/apache/hadoop/hbase/io/hfile/FixedFileTrailer.java:49 s/preceeding/preceding/ src/main/java/org/apache/hadoop/hbase/io/hfile/FixedFileTrailer.java:50 s/deermines/determines/ src/main/java/org/apache/hadoop/hbase/io/hfile/FixedFileTrailer.java:51 s/does not need/do not need/ src/main/java/org/apache/hadoop/hbase/io/hfile/FixedFileTrailer.java:119 s/major/minor/ src/main/java/org/apache/hadoop/hbase/io/hfile/FixedFileTrailer.java:260 Rename the existing expectVersion to expectMajorVersion for clarity. src/main/java/org/apache/hadoop/hbase/io/hfile/FixedFileTrailer.java:343 Rename to expectMajorVersion for clarity. Also, does the version field of this class now only contain the major version? If so, rename it to majorVersion. src/main/java/org/apache/hadoop/hbase/io/hfile/FixedFileTrailer.java:345 Add the word "major" to the error message. src/main/java/org/apache/hadoop/hbase/io/hfile/FixedFileTrailer.java:462 Rename to getMajorVersion src/main/java/org/apache/hadoop/hbase/io/hfile/HFile.java:402 Can we modify the parameter type and get rid of this cast? src/main/java/org/apache/hadoop/hbase/io/hfile/HFile.java:415 This is not a constructor, but a factory method. src/main/java/org/apache/hadoop/hbase/io/hfile/HFile.java:417 Add "ForTest" to method name for clarity. src/main/java/org/apache/hadoop/hbase/io/hfile/HFileBlock.java:91 s/has/have/ src/main/java/org/apache/hadoop/hbase/io/hfile/HFileBlock.java:95 Consider replacing the "_V0" suffix with something more meaningful like "_NO_CHECKSUM". src/main/java/org/apache/hadoop/hbase/io/hfile/HFileBlock.java:102 Consider using a suffix "_WITH_CHECKSUM". src/main/java/org/apache/hadoop/hbase/io/hfile/HFileBlock.java:123 When the number of bytes per checksum becomes configurable, will that require a persistent data format change? What will the upgrade procedure be in that case? src/main/java/org/apache/hadoop/hbase/io/hfile/HFileBlock.java:138 It is not clear from this call that 0 is minor version. Create a constant with a meaningful name (e.g. MINOR_VERSION_NO_CHECKSUM). src/main/java/org/apache/hadoop/hbase/io/hfile/HFileBlock.java:149 Consider adding "WithChecksum" to variable name. src/main/java/org/apache/hadoop/hbase/io/hfile/HFileBlock.java:395-398 This is becoming bulky. Factor out the common term (uncompressedSizeWithoutHeader + headerSize() + cksumBytes) into a local variable. Also avoid evaluating headerSize() twice. src/main/java/org/apache/hadoop/hbase/io/hfile/HFileBlock.java:400-402 Reuse the new local variables from the above comments here. src/main/java/org/apache/hadoop/hbase/io/hfile/HFileBlock.java:441-442 Update this comment, since the meaning of "extraBytes" has changed from just being the room for the next block's header to a much more complex role. src/main/java/org/apache/hadoop/hbase/io/hfile/HFileBlock.java:757-758 Should we throw an IOException instead since this method already throws it? src/main/java/org/apache/hadoop/hbase/io/hfile/HFileBlock.java:771-772 tmp is a particularly bad variable name. Combine these two lines and get rid of tmp. src/main/java/org/apache/hadoop/hbase/io/hfile/HFileBlock.java:808-809 Get rid of tmp and combine these two lines. src/main/java/org/apache/hadoop/hbase/io/hfile/HFileBlock.java:788 This method and the above one seem to share a lot of code. Is it possible to get rid of code duplication? Also, these two methods seem isolated enough to be moved to another class, maybe even as static methods. src/main/java/org/apache/hadoop/hbase/io/hfile/HFileBlock.java:866-870 Do we need this in case of minorVersion = 0? Or do we always write new files with checksums? src/main/java/org/apache/hadoop/hbase/io/hfile/HFileBlock.java:973-975 Somehow the fact that checksum format is different for compressed and uncompressed blocks has escaped me halfway through the review. Maybe it is worth explicitly mentioning that in javadoc. src/main/java/org/apache/hadoop/hbase/io/hfile/HFileBlock.java:999-1011 Use System.arraycopy instead of loops. Add "ForTest" to method name to discourage its use in production. src/main/java/org/apache/hadoop/hbase/io/hfile/HFileBlock.java:1202 Nice! Thanks for locking down these internal base classes and methods. src/main/java/org/apache/hadoop/hbase/io/hfile/HFileBlock.java:1389-1390 Delete one of these lines. src/main/java/org/apache/hadoop/hbase/io/hfile/HFileBlock.java:1520-1527 Does it make sense to move this checksum instantiation code to a function and reuse it everywhere we call ChecksumFactory.newInstance()? src/main/java/org/apache/hadoop/hbase/io/hfile/HFileBlock.java:1786 Remove this and other debug output statements. src/main/java/org/apache/hadoop/hbase/io/hfile/HFileBlock.java:1871 As I mentioned, it is probably better to move checksum computation and validation code to a separate utility class. src/main/java/org/apache/hadoop/hbase/io/hfile/HFileWriterV1.java:220 Use a constant to indicate that this is a minor version without checksum support instead of just 0. src/main/java/org/apache/hadoop/hbase/util/HFileSystem.java:56 Is this necessary? Does not Java call default constructors automatically? src/test/java/org/apache/hadoop/hbase/io/hfile/TestHFileBlock.java:70 This is for testing only, right? src/test/java/org/apache/hadoop/hbase/io/hfile/TestHFileBlock.java:225 Long line. src/test/java/org/apache/hadoop/hbase/io/hfile/TestHFileBlockCompatibility.java:1 A lot in this file appears to be copy-paste from TestHFileBlock, so it very difficult to see the real changes. Please reuse the appropriate code instead. REVISION DETAIL https://reviews.facebook.net/D1521
          Hide
          Ted Yu added a comment -

          When FS_CHECKSUM_VERIFICATION carries value of false, would it still make sense that we retry the operation if hbase-checksums mismatch ?
          Meaning, would getStreamWithChecksum() return a stream which does checksum validation inside the FileSystem ?

          Show
          Ted Yu added a comment - When FS_CHECKSUM_VERIFICATION carries value of false, would it still make sense that we retry the operation if hbase-checksums mismatch ? Meaning, would getStreamWithChecksum() return a stream which does checksum validation inside the FileSystem ?
          Hide
          Phabricator added a comment -

          dhruba has commented on the revision "[jira] HBASE-5074 Support checksums in HBase block cache".

          Thanks for the excellent and detailed review Mikhail. I am making most of the changes you proposed and will post a new patch very soon. Really appreciate your time with the huge review.

          INLINE COMMENTS
          src/main/java/org/apache/hadoop/hbase/io/hfile/AbstractHFileReader.java:357 The Reader would need to reopen a file with chesksums switched on/off if needed (on checksum failure). Hence the filessytem object is needed here.
          src/main/java/org/apache/hadoop/hbase/io/hfile/HFile.java:402 This is messy, because there are 100001 places where FileSystem type is being used in HBase. This will make this patch immensely large and difficult to merge with every new change in trunk. does this sound reasonable? If not, I can change all mention of FileSystem to HFileSystem in a succeeding patch perhaps?
          src/main/java/org/apache/hadoop/hbase/io/hfile/HFileBlock.java:123 It does not need a disk format change if we decide to make it configurable. Each disk block has a 4 byte field to store the bytes-per-checksum. In the current code, the value that is stored in this field is 16K.
          src/main/java/org/apache/hadoop/hbase/io/hfile/HFile.java:417 This is a code cleanup but not related to this patch. I would like to defer this for later.
          src/main/java/org/apache/hadoop/hbase/io/hfile/HFileBlock.java:149 I did not do this because it makes the variable names very very long-winded. Instead, I wrote more comments to describe each variable. Let me know if you think that this is not enough for documentation.
          src/main/java/org/apache/hadoop/hbase/io/hfile/HFileBlock.java:441-442 The meaning of extrabytes has not changed. It still means that we need to allocate space for the next header.
          src/main/java/org/apache/hadoop/hbase/io/hfile/HFileBlock.java:788 I put the creation of the checksum object in a common method. The remainder of the two methods are quite similar but unfortunately one operates on a byte buffer while the other operates directly on the ByteBuffer. One way to merge these two methods is to incur a buffer copy which I am trying to avoid. Also, these two methods are very specific to how the header in the HBlockFile is laid out, so I kept them as instance methods rather than static methods. If you feel strongly about this, then I will be happy yo move them to a different file.
          src/main/java/org/apache/hadoop/hbase/io/hfile/HFileBlock.java:866-870 All new files always have checksums. But the log line was for debugging, so I will get rid of it.
          src/main/java/org/apache/hadoop/hbase/io/hfile/HFileBlock.java:973-975 Good point. I enhanced the javadocs where the variable onDiskChecksum is declared.
          src/main/java/org/apache/hadoop/hbase/io/hfile/HFileBlock.java:1520-1527 This code piece maybe done be a different helper thread. So I am throwing RunTime exception here so that the RegionServer shuts down if it is unable to instantiate a Checksum class. Is there something better I can do here?
          src/main/java/org/apache/hadoop/hbase/io/hfile/HFileBlock.java:1871 I actually made this a protected method so that I can override it in the unit test to simulate checksum failure.

          REVISION DETAIL
          https://reviews.facebook.net/D1521

          Show
          Phabricator added a comment - dhruba has commented on the revision " [jira] HBASE-5074 Support checksums in HBase block cache". Thanks for the excellent and detailed review Mikhail. I am making most of the changes you proposed and will post a new patch very soon. Really appreciate your time with the huge review. INLINE COMMENTS src/main/java/org/apache/hadoop/hbase/io/hfile/AbstractHFileReader.java:357 The Reader would need to reopen a file with chesksums switched on/off if needed (on checksum failure). Hence the filessytem object is needed here. src/main/java/org/apache/hadoop/hbase/io/hfile/HFile.java:402 This is messy, because there are 100001 places where FileSystem type is being used in HBase. This will make this patch immensely large and difficult to merge with every new change in trunk. does this sound reasonable? If not, I can change all mention of FileSystem to HFileSystem in a succeeding patch perhaps? src/main/java/org/apache/hadoop/hbase/io/hfile/HFileBlock.java:123 It does not need a disk format change if we decide to make it configurable. Each disk block has a 4 byte field to store the bytes-per-checksum. In the current code, the value that is stored in this field is 16K. src/main/java/org/apache/hadoop/hbase/io/hfile/HFile.java:417 This is a code cleanup but not related to this patch. I would like to defer this for later. src/main/java/org/apache/hadoop/hbase/io/hfile/HFileBlock.java:149 I did not do this because it makes the variable names very very long-winded. Instead, I wrote more comments to describe each variable. Let me know if you think that this is not enough for documentation. src/main/java/org/apache/hadoop/hbase/io/hfile/HFileBlock.java:441-442 The meaning of extrabytes has not changed. It still means that we need to allocate space for the next header. src/main/java/org/apache/hadoop/hbase/io/hfile/HFileBlock.java:788 I put the creation of the checksum object in a common method. The remainder of the two methods are quite similar but unfortunately one operates on a byte buffer while the other operates directly on the ByteBuffer. One way to merge these two methods is to incur a buffer copy which I am trying to avoid. Also, these two methods are very specific to how the header in the HBlockFile is laid out, so I kept them as instance methods rather than static methods. If you feel strongly about this, then I will be happy yo move them to a different file. src/main/java/org/apache/hadoop/hbase/io/hfile/HFileBlock.java:866-870 All new files always have checksums. But the log line was for debugging, so I will get rid of it. src/main/java/org/apache/hadoop/hbase/io/hfile/HFileBlock.java:973-975 Good point. I enhanced the javadocs where the variable onDiskChecksum is declared. src/main/java/org/apache/hadoop/hbase/io/hfile/HFileBlock.java:1520-1527 This code piece maybe done be a different helper thread. So I am throwing RunTime exception here so that the RegionServer shuts down if it is unable to instantiate a Checksum class. Is there something better I can do here? src/main/java/org/apache/hadoop/hbase/io/hfile/HFileBlock.java:1871 I actually made this a protected method so that I can override it in the unit test to simulate checksum failure. REVISION DETAIL https://reviews.facebook.net/D1521
          Hide
          Phabricator added a comment -

          dhruba has commented on the revision "[jira] HBASE-5074 Support checksums in HBase block cache".

          Thanks for the excellent and detailed review Mikhail. I am making most of the changes you proposed and will post a new patch very soon. Really appreciate your time with the huge review.

          INLINE COMMENTS
          src/main/java/org/apache/hadoop/hbase/io/hfile/AbstractHFileReader.java:357 The Reader would need to reopen a file with chesksums switched on/off if needed (on checksum failure). Hence the filessytem object is needed here.
          src/main/java/org/apache/hadoop/hbase/io/hfile/HFile.java:402 This is messy, because there are 100001 places where FileSystem type is being used in HBase. This will make this patch immensely large and difficult to merge with every new change in trunk. does this sound reasonable? If not, I can change all mention of FileSystem to HFileSystem in a succeeding patch perhaps?
          src/main/java/org/apache/hadoop/hbase/io/hfile/HFileBlock.java:123 It does not need a disk format change if we decide to make it configurable. Each disk block has a 4 byte field to store the bytes-per-checksum. In the current code, the value that is stored in this field is 16K.
          src/main/java/org/apache/hadoop/hbase/io/hfile/HFile.java:417 This is a code cleanup but not related to this patch. I would like to defer this for later.
          src/main/java/org/apache/hadoop/hbase/io/hfile/HFileBlock.java:149 I did not do this because it makes the variable names very very long-winded. Instead, I wrote more comments to describe each variable. Let me know if you think that this is not enough for documentation.
          src/main/java/org/apache/hadoop/hbase/io/hfile/HFileBlock.java:441-442 The meaning of extrabytes has not changed. It still means that we need to allocate space for the next header.
          src/main/java/org/apache/hadoop/hbase/io/hfile/HFileBlock.java:788 I put the creation of the checksum object in a common method. The remainder of the two methods are quite similar but unfortunately one operates on a byte buffer while the other operates directly on the ByteBuffer. One way to merge these two methods is to incur a buffer copy which I am trying to avoid. Also, these two methods are very specific to how the header in the HBlockFile is laid out, so I kept them as instance methods rather than static methods. If you feel strongly about this, then I will be happy yo move them to a different file.
          src/main/java/org/apache/hadoop/hbase/io/hfile/HFileBlock.java:866-870 All new files always have checksums. But the log line was for debugging, so I will get rid of it.
          src/main/java/org/apache/hadoop/hbase/io/hfile/HFileBlock.java:973-975 Good point. I enhanced the javadocs where the variable onDiskChecksum is declared.
          src/main/java/org/apache/hadoop/hbase/io/hfile/HFileBlock.java:1520-1527 This code piece maybe done be a different helper thread. So I am throwing RunTime exception here so that the RegionServer shuts down if it is unable to instantiate a Checksum class. Is there something better I can do here?
          src/main/java/org/apache/hadoop/hbase/io/hfile/HFileBlock.java:1871 I actually made this a protected method so that I can override it in the unit test to simulate checksum failure.

          REVISION DETAIL
          https://reviews.facebook.net/D1521

          Show
          Phabricator added a comment - dhruba has commented on the revision " [jira] HBASE-5074 Support checksums in HBase block cache". Thanks for the excellent and detailed review Mikhail. I am making most of the changes you proposed and will post a new patch very soon. Really appreciate your time with the huge review. INLINE COMMENTS src/main/java/org/apache/hadoop/hbase/io/hfile/AbstractHFileReader.java:357 The Reader would need to reopen a file with chesksums switched on/off if needed (on checksum failure). Hence the filessytem object is needed here. src/main/java/org/apache/hadoop/hbase/io/hfile/HFile.java:402 This is messy, because there are 100001 places where FileSystem type is being used in HBase. This will make this patch immensely large and difficult to merge with every new change in trunk. does this sound reasonable? If not, I can change all mention of FileSystem to HFileSystem in a succeeding patch perhaps? src/main/java/org/apache/hadoop/hbase/io/hfile/HFileBlock.java:123 It does not need a disk format change if we decide to make it configurable. Each disk block has a 4 byte field to store the bytes-per-checksum. In the current code, the value that is stored in this field is 16K. src/main/java/org/apache/hadoop/hbase/io/hfile/HFile.java:417 This is a code cleanup but not related to this patch. I would like to defer this for later. src/main/java/org/apache/hadoop/hbase/io/hfile/HFileBlock.java:149 I did not do this because it makes the variable names very very long-winded. Instead, I wrote more comments to describe each variable. Let me know if you think that this is not enough for documentation. src/main/java/org/apache/hadoop/hbase/io/hfile/HFileBlock.java:441-442 The meaning of extrabytes has not changed. It still means that we need to allocate space for the next header. src/main/java/org/apache/hadoop/hbase/io/hfile/HFileBlock.java:788 I put the creation of the checksum object in a common method. The remainder of the two methods are quite similar but unfortunately one operates on a byte buffer while the other operates directly on the ByteBuffer. One way to merge these two methods is to incur a buffer copy which I am trying to avoid. Also, these two methods are very specific to how the header in the HBlockFile is laid out, so I kept them as instance methods rather than static methods. If you feel strongly about this, then I will be happy yo move them to a different file. src/main/java/org/apache/hadoop/hbase/io/hfile/HFileBlock.java:866-870 All new files always have checksums. But the log line was for debugging, so I will get rid of it. src/main/java/org/apache/hadoop/hbase/io/hfile/HFileBlock.java:973-975 Good point. I enhanced the javadocs where the variable onDiskChecksum is declared. src/main/java/org/apache/hadoop/hbase/io/hfile/HFileBlock.java:1520-1527 This code piece maybe done be a different helper thread. So I am throwing RunTime exception here so that the RegionServer shuts down if it is unable to instantiate a Checksum class. Is there something better I can do here? src/main/java/org/apache/hadoop/hbase/io/hfile/HFileBlock.java:1871 I actually made this a protected method so that I can override it in the unit test to simulate checksum failure. REVISION DETAIL https://reviews.facebook.net/D1521
          Hide
          Phabricator added a comment -

          todd has commented on the revision "[jira] HBASE-5074 Support checksums in HBase block cache".

          I'm a little skeptical of pushing HFileSystem in at the createHRegion level - can't we construct it lower down and have fewer sweeping changes across the codebase?

          Otherwise I'm pretty psyched about this feature! Should be a great speed boost.

          INLINE COMMENTS
          src/main/java/org/apache/hadoop/hbase/HConstants.java:598 I think this could be clarified a bit... I am at the top of the diff so don't have context, so not sure whether it means that no checksums will be verified, or if checksums will be verified but only when the HFile checksum isn't present or can't be verified?

          I'd expect the config to have several different modes, rather than a boolean:

          FS_ONLY: always verify the FS checksum, ignore the HFile checksum
          BOTH: always verify the FS checksum and the HFile checksum (when available)
          OPTIMIZED: verify the HFile checksum. If it fails or not present, fall back to the FS checksum (this would be the default)
          NONE: don't verify any checksums (for those who like to live on the edge!)
          src/main/java/org/apache/hadoop/hbase/io/hfile/AbstractHFileReader.java:357 This is sort of working around a deficiency in the Hadoop input stream APIs, right? I think this is a decent workaround for now, but do you think it would be a good idea to add a new interface like "Checksummed" to Hadoop, which would add a setVerifyChecksum() call?
          src/main/java/org/apache/hadoop/hbase/io/hfile/FixedFileTrailer.java:123-127 do we need this extra ctor? considering this is a private API seems like we could just update the call sites to add a ', 0'
          src/main/java/org/apache/hadoop/hbase/io/hfile/HFileBlock.java:93 may be worth adding a constant here like VERSION_CURRENT = VERSION_WITH_CHECKSUM.
          src/main/java/org/apache/hadoop/hbase/io/hfile/HFileBlock.java:102 or HEADER_SIZE_V1 = ...
          static final int HEADER_SIZE = HEADER_SIZE_V1;
          src/main/java/org/apache/hadoop/hbase/io/hfile/HFileBlock.java:132 why is this a warning?
          src/main/java/org/apache/hadoop/hbase/io/hfile/HFileBlock.java:175 this takes a parameter minorVersion - is it unused?
          src/main/java/org/apache/hadoop/hbase/io/hfile/HFileBlock.java:190 typo: @param minorVersion
          src/main/java/org/apache/hadoop/hbase/io/hfile/HFileBlock.java:205-208 confused about this - see above... if this constructor is only meant for minor version 0 blocks, shouldn't we have Preconditions.checkArgument(minorVersion == 0) or something?
          src/main/java/org/apache/hadoop/hbase/io/hfile/HFileBlock.java:229 Skeptical of this line – why isn't it onDiskDataSizeWithoutHeader + HEADER_SIZE_V0?
          src/main/java/org/apache/hadoop/hbase/io/hfile/HFileBlock.java:252 a little confused why this doesn't use the onDiskDataSizeWithHeader member...
          src/main/java/org/apache/hadoop/hbase/io/hfile/HFileBlock.java:629 typo: incudes
          src/main/java/org/apache/hadoop/hbase/io/hfile/HFileBlock.java:751 It would be nice to reuse the code in Hadoop trunk here - see DataChecksum.verifyChunkedSums for example. The benefit is that we have JNI implementations using SSE code there. Only downside is that the JNI code requires direct byte buffers, which I guess we aren't using here... perhaps punt to a future improvement.
          src/main/java/org/apache/hadoop/hbase/io/hfile/HFileBlock.java:774 I assume this will move to a trace level debug or something?
          src/main/java/org/apache/hadoop/hbase/io/hfile/HFileBlock.java:1634 I think rather than returning null it makes more sense to throw a ChecksumException here
          src/main/java/org/apache/hadoop/hbase/io/hfile/HFileBlock.java:1401-1402 I don't know this area of the code well – is it supposed to be thread-safe? This lazy-initialization pattern is not.
          src/main/java/org/apache/hadoop/hbase/io/hfile/HFileBlock.java:1605 can we just recurse here?
          src/main/java/org/apache/hadoop/hbase/io/hfile/HFileBlock.java:1896 it's not possible to get at the file path from this context, is it?
          src/main/java/org/apache/hadoop/hbase/io/hfile/HFileBlock.java:1888 would be nice to reuse Hadoop code here if possible for performance
          src/main/java/org/apache/hadoop/hbase/io/hfile/HFilePrettyPrinter.java:206-207 this might result in "Wrong FS" if the default FS doesn't match the path provided
          src/main/java/org/apache/hadoop/hbase/io/hfile/HFileReaderV2.java:1072 can you include the path in the msg?
          src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java:3546-3547 this code should be passing the path as well to avoid "Wrong FS"
          src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java:3597 same - default FS may not match the hbase rootdir FS. Maybe RegionServerServices should expose a getHFilesystem() call?
          src/main/java/org/apache/hadoop/hbase/regionserver/handler/OpenRegionHandler.java:335 this downcast would be avoided by adding getFileSystem to RegionServerServices above
          src/main/java/org/apache/hadoop/hbase/util/ChecksumByteArrayOutputStream.java:31 could you reuse o.a.h.io.DataOutputBuffer instead of making a new class?
          src/main/java/org/apache/hadoop/hbase/util/ChecksumFactory.java:39 I'd strongly recommend starting with an implementation of CRC32C (castagnioli polynomial) instead of the Zip polynomial - reason being that there is a hardware implementation in SSE4.2. You can rip the pure-java implementation from Hadoop trunk (PureJavaCrc32C) into HBase.

          Failing that, we need to add hfile metadata which specifies the checksum algorithm in addition to the checksum type.
          You could reuse the DataChecksum class from Hadoop there - it encapsulates (type, bytesPerChecksum, checksumSize)
          src/main/java/org/apache/hadoop/hbase/util/ChecksumFactory.java:65-67 this reflection based method is going to be horribly slow
          src/main/java/org/apache/hadoop/hbase/util/HFileSystem.java:59 typo, operation.
          src/main/java/org/apache/hadoop/hbase/util/HFileSystem.java:54 the ctor should take a Path or URI indicating the filesystem, rather than always using the default - same "wrong fs" issue as above
          src/main/java/org/apache/hadoop/hbase/util/HFileSystem.java:67 typo: in->is
          src/main/java/org/apache/hadoop/hbase/util/HFileSystem.java:132-133 this is rarely the right call

          REVISION DETAIL
          https://reviews.facebook.net/D1521

          Show
          Phabricator added a comment - todd has commented on the revision " [jira] HBASE-5074 Support checksums in HBase block cache". I'm a little skeptical of pushing HFileSystem in at the createHRegion level - can't we construct it lower down and have fewer sweeping changes across the codebase? Otherwise I'm pretty psyched about this feature! Should be a great speed boost. INLINE COMMENTS src/main/java/org/apache/hadoop/hbase/HConstants.java:598 I think this could be clarified a bit... I am at the top of the diff so don't have context, so not sure whether it means that no checksums will be verified, or if checksums will be verified but only when the HFile checksum isn't present or can't be verified? I'd expect the config to have several different modes, rather than a boolean: FS_ONLY: always verify the FS checksum, ignore the HFile checksum BOTH: always verify the FS checksum and the HFile checksum (when available) OPTIMIZED: verify the HFile checksum. If it fails or not present, fall back to the FS checksum (this would be the default) NONE: don't verify any checksums (for those who like to live on the edge!) src/main/java/org/apache/hadoop/hbase/io/hfile/AbstractHFileReader.java:357 This is sort of working around a deficiency in the Hadoop input stream APIs, right? I think this is a decent workaround for now, but do you think it would be a good idea to add a new interface like "Checksummed" to Hadoop, which would add a setVerifyChecksum() call? src/main/java/org/apache/hadoop/hbase/io/hfile/FixedFileTrailer.java:123-127 do we need this extra ctor? considering this is a private API seems like we could just update the call sites to add a ', 0' src/main/java/org/apache/hadoop/hbase/io/hfile/HFileBlock.java:93 may be worth adding a constant here like VERSION_CURRENT = VERSION_WITH_CHECKSUM. src/main/java/org/apache/hadoop/hbase/io/hfile/HFileBlock.java:102 or HEADER_SIZE_V1 = ... static final int HEADER_SIZE = HEADER_SIZE_V1; src/main/java/org/apache/hadoop/hbase/io/hfile/HFileBlock.java:132 why is this a warning? src/main/java/org/apache/hadoop/hbase/io/hfile/HFileBlock.java:175 this takes a parameter minorVersion - is it unused? src/main/java/org/apache/hadoop/hbase/io/hfile/HFileBlock.java:190 typo: @param minorVersion src/main/java/org/apache/hadoop/hbase/io/hfile/HFileBlock.java:205-208 confused about this - see above... if this constructor is only meant for minor version 0 blocks, shouldn't we have Preconditions.checkArgument(minorVersion == 0) or something? src/main/java/org/apache/hadoop/hbase/io/hfile/HFileBlock.java:229 Skeptical of this line – why isn't it onDiskDataSizeWithoutHeader + HEADER_SIZE_V0? src/main/java/org/apache/hadoop/hbase/io/hfile/HFileBlock.java:252 a little confused why this doesn't use the onDiskDataSizeWithHeader member... src/main/java/org/apache/hadoop/hbase/io/hfile/HFileBlock.java:629 typo: incudes src/main/java/org/apache/hadoop/hbase/io/hfile/HFileBlock.java:751 It would be nice to reuse the code in Hadoop trunk here - see DataChecksum.verifyChunkedSums for example. The benefit is that we have JNI implementations using SSE code there. Only downside is that the JNI code requires direct byte buffers, which I guess we aren't using here... perhaps punt to a future improvement. src/main/java/org/apache/hadoop/hbase/io/hfile/HFileBlock.java:774 I assume this will move to a trace level debug or something? src/main/java/org/apache/hadoop/hbase/io/hfile/HFileBlock.java:1634 I think rather than returning null it makes more sense to throw a ChecksumException here src/main/java/org/apache/hadoop/hbase/io/hfile/HFileBlock.java:1401-1402 I don't know this area of the code well – is it supposed to be thread-safe? This lazy-initialization pattern is not. src/main/java/org/apache/hadoop/hbase/io/hfile/HFileBlock.java:1605 can we just recurse here? src/main/java/org/apache/hadoop/hbase/io/hfile/HFileBlock.java:1896 it's not possible to get at the file path from this context, is it? src/main/java/org/apache/hadoop/hbase/io/hfile/HFileBlock.java:1888 would be nice to reuse Hadoop code here if possible for performance src/main/java/org/apache/hadoop/hbase/io/hfile/HFilePrettyPrinter.java:206-207 this might result in "Wrong FS" if the default FS doesn't match the path provided src/main/java/org/apache/hadoop/hbase/io/hfile/HFileReaderV2.java:1072 can you include the path in the msg? src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java:3546-3547 this code should be passing the path as well to avoid "Wrong FS" src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java:3597 same - default FS may not match the hbase rootdir FS. Maybe RegionServerServices should expose a getHFilesystem() call? src/main/java/org/apache/hadoop/hbase/regionserver/handler/OpenRegionHandler.java:335 this downcast would be avoided by adding getFileSystem to RegionServerServices above src/main/java/org/apache/hadoop/hbase/util/ChecksumByteArrayOutputStream.java:31 could you reuse o.a.h.io.DataOutputBuffer instead of making a new class? src/main/java/org/apache/hadoop/hbase/util/ChecksumFactory.java:39 I'd strongly recommend starting with an implementation of CRC32C (castagnioli polynomial) instead of the Zip polynomial - reason being that there is a hardware implementation in SSE4.2. You can rip the pure-java implementation from Hadoop trunk (PureJavaCrc32C) into HBase. Failing that, we need to add hfile metadata which specifies the checksum algorithm in addition to the checksum type. You could reuse the DataChecksum class from Hadoop there - it encapsulates (type, bytesPerChecksum, checksumSize) src/main/java/org/apache/hadoop/hbase/util/ChecksumFactory.java:65-67 this reflection based method is going to be horribly slow src/main/java/org/apache/hadoop/hbase/util/HFileSystem.java:59 typo, operation. src/main/java/org/apache/hadoop/hbase/util/HFileSystem.java:54 the ctor should take a Path or URI indicating the filesystem, rather than always using the default - same "wrong fs" issue as above src/main/java/org/apache/hadoop/hbase/util/HFileSystem.java:67 typo: in->is src/main/java/org/apache/hadoop/hbase/util/HFileSystem.java:132-133 this is rarely the right call REVISION DETAIL https://reviews.facebook.net/D1521
          Hide
          Phabricator added a comment -

          todd has commented on the revision "[jira] HBASE-5074 Support checksums in HBase block cache".

          I'm a little skeptical of pushing HFileSystem in at the createHRegion level - can't we construct it lower down and have fewer sweeping changes across the codebase?

          Otherwise I'm pretty psyched about this feature! Should be a great speed boost.

          INLINE COMMENTS
          src/main/java/org/apache/hadoop/hbase/HConstants.java:598 I think this could be clarified a bit... I am at the top of the diff so don't have context, so not sure whether it means that no checksums will be verified, or if checksums will be verified but only when the HFile checksum isn't present or can't be verified?

          I'd expect the config to have several different modes, rather than a boolean:

          FS_ONLY: always verify the FS checksum, ignore the HFile checksum
          BOTH: always verify the FS checksum and the HFile checksum (when available)
          OPTIMIZED: verify the HFile checksum. If it fails or not present, fall back to the FS checksum (this would be the default)
          NONE: don't verify any checksums (for those who like to live on the edge!)
          src/main/java/org/apache/hadoop/hbase/io/hfile/AbstractHFileReader.java:357 This is sort of working around a deficiency in the Hadoop input stream APIs, right? I think this is a decent workaround for now, but do you think it would be a good idea to add a new interface like "Checksummed" to Hadoop, which would add a setVerifyChecksum() call?
          src/main/java/org/apache/hadoop/hbase/io/hfile/FixedFileTrailer.java:123-127 do we need this extra ctor? considering this is a private API seems like we could just update the call sites to add a ', 0'
          src/main/java/org/apache/hadoop/hbase/io/hfile/HFileBlock.java:93 may be worth adding a constant here like VERSION_CURRENT = VERSION_WITH_CHECKSUM.
          src/main/java/org/apache/hadoop/hbase/io/hfile/HFileBlock.java:102 or HEADER_SIZE_V1 = ...
          static final int HEADER_SIZE = HEADER_SIZE_V1;
          src/main/java/org/apache/hadoop/hbase/io/hfile/HFileBlock.java:132 why is this a warning?
          src/main/java/org/apache/hadoop/hbase/io/hfile/HFileBlock.java:175 this takes a parameter minorVersion - is it unused?
          src/main/java/org/apache/hadoop/hbase/io/hfile/HFileBlock.java:190 typo: @param minorVersion
          src/main/java/org/apache/hadoop/hbase/io/hfile/HFileBlock.java:205-208 confused about this - see above... if this constructor is only meant for minor version 0 blocks, shouldn't we have Preconditions.checkArgument(minorVersion == 0) or something?
          src/main/java/org/apache/hadoop/hbase/io/hfile/HFileBlock.java:229 Skeptical of this line – why isn't it onDiskDataSizeWithoutHeader + HEADER_SIZE_V0?
          src/main/java/org/apache/hadoop/hbase/io/hfile/HFileBlock.java:252 a little confused why this doesn't use the onDiskDataSizeWithHeader member...
          src/main/java/org/apache/hadoop/hbase/io/hfile/HFileBlock.java:629 typo: incudes
          src/main/java/org/apache/hadoop/hbase/io/hfile/HFileBlock.java:751 It would be nice to reuse the code in Hadoop trunk here - see DataChecksum.verifyChunkedSums for example. The benefit is that we have JNI implementations using SSE code there. Only downside is that the JNI code requires direct byte buffers, which I guess we aren't using here... perhaps punt to a future improvement.
          src/main/java/org/apache/hadoop/hbase/io/hfile/HFileBlock.java:774 I assume this will move to a trace level debug or something?
          src/main/java/org/apache/hadoop/hbase/io/hfile/HFileBlock.java:1634 I think rather than returning null it makes more sense to throw a ChecksumException here
          src/main/java/org/apache/hadoop/hbase/io/hfile/HFileBlock.java:1401-1402 I don't know this area of the code well – is it supposed to be thread-safe? This lazy-initialization pattern is not.
          src/main/java/org/apache/hadoop/hbase/io/hfile/HFileBlock.java:1605 can we just recurse here?
          src/main/java/org/apache/hadoop/hbase/io/hfile/HFileBlock.java:1896 it's not possible to get at the file path from this context, is it?
          src/main/java/org/apache/hadoop/hbase/io/hfile/HFileBlock.java:1888 would be nice to reuse Hadoop code here if possible for performance
          src/main/java/org/apache/hadoop/hbase/io/hfile/HFilePrettyPrinter.java:206-207 this might result in "Wrong FS" if the default FS doesn't match the path provided
          src/main/java/org/apache/hadoop/hbase/io/hfile/HFileReaderV2.java:1072 can you include the path in the msg?
          src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java:3546-3547 this code should be passing the path as well to avoid "Wrong FS"
          src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java:3597 same - default FS may not match the hbase rootdir FS. Maybe RegionServerServices should expose a getHFilesystem() call?
          src/main/java/org/apache/hadoop/hbase/regionserver/handler/OpenRegionHandler.java:335 this downcast would be avoided by adding getFileSystem to RegionServerServices above
          src/main/java/org/apache/hadoop/hbase/util/ChecksumByteArrayOutputStream.java:31 could you reuse o.a.h.io.DataOutputBuffer instead of making a new class?
          src/main/java/org/apache/hadoop/hbase/util/ChecksumFactory.java:39 I'd strongly recommend starting with an implementation of CRC32C (castagnioli polynomial) instead of the Zip polynomial - reason being that there is a hardware implementation in SSE4.2. You can rip the pure-java implementation from Hadoop trunk (PureJavaCrc32C) into HBase.

          Failing that, we need to add hfile metadata which specifies the checksum algorithm in addition to the checksum type.
          You could reuse the DataChecksum class from Hadoop there - it encapsulates (type, bytesPerChecksum, checksumSize)
          src/main/java/org/apache/hadoop/hbase/util/ChecksumFactory.java:65-67 this reflection based method is going to be horribly slow
          src/main/java/org/apache/hadoop/hbase/util/HFileSystem.java:59 typo, operation.
          src/main/java/org/apache/hadoop/hbase/util/HFileSystem.java:54 the ctor should take a Path or URI indicating the filesystem, rather than always using the default - same "wrong fs" issue as above
          src/main/java/org/apache/hadoop/hbase/util/HFileSystem.java:67 typo: in->is
          src/main/java/org/apache/hadoop/hbase/util/HFileSystem.java:132-133 this is rarely the right call

          REVISION DETAIL
          https://reviews.facebook.net/D1521

          Show
          Phabricator added a comment - todd has commented on the revision " [jira] HBASE-5074 Support checksums in HBase block cache". I'm a little skeptical of pushing HFileSystem in at the createHRegion level - can't we construct it lower down and have fewer sweeping changes across the codebase? Otherwise I'm pretty psyched about this feature! Should be a great speed boost. INLINE COMMENTS src/main/java/org/apache/hadoop/hbase/HConstants.java:598 I think this could be clarified a bit... I am at the top of the diff so don't have context, so not sure whether it means that no checksums will be verified, or if checksums will be verified but only when the HFile checksum isn't present or can't be verified? I'd expect the config to have several different modes, rather than a boolean: FS_ONLY: always verify the FS checksum, ignore the HFile checksum BOTH: always verify the FS checksum and the HFile checksum (when available) OPTIMIZED: verify the HFile checksum. If it fails or not present, fall back to the FS checksum (this would be the default) NONE: don't verify any checksums (for those who like to live on the edge!) src/main/java/org/apache/hadoop/hbase/io/hfile/AbstractHFileReader.java:357 This is sort of working around a deficiency in the Hadoop input stream APIs, right? I think this is a decent workaround for now, but do you think it would be a good idea to add a new interface like "Checksummed" to Hadoop, which would add a setVerifyChecksum() call? src/main/java/org/apache/hadoop/hbase/io/hfile/FixedFileTrailer.java:123-127 do we need this extra ctor? considering this is a private API seems like we could just update the call sites to add a ', 0' src/main/java/org/apache/hadoop/hbase/io/hfile/HFileBlock.java:93 may be worth adding a constant here like VERSION_CURRENT = VERSION_WITH_CHECKSUM. src/main/java/org/apache/hadoop/hbase/io/hfile/HFileBlock.java:102 or HEADER_SIZE_V1 = ... static final int HEADER_SIZE = HEADER_SIZE_V1; src/main/java/org/apache/hadoop/hbase/io/hfile/HFileBlock.java:132 why is this a warning? src/main/java/org/apache/hadoop/hbase/io/hfile/HFileBlock.java:175 this takes a parameter minorVersion - is it unused? src/main/java/org/apache/hadoop/hbase/io/hfile/HFileBlock.java:190 typo: @param minorVersion src/main/java/org/apache/hadoop/hbase/io/hfile/HFileBlock.java:205-208 confused about this - see above... if this constructor is only meant for minor version 0 blocks, shouldn't we have Preconditions.checkArgument(minorVersion == 0) or something? src/main/java/org/apache/hadoop/hbase/io/hfile/HFileBlock.java:229 Skeptical of this line – why isn't it onDiskDataSizeWithoutHeader + HEADER_SIZE_V0? src/main/java/org/apache/hadoop/hbase/io/hfile/HFileBlock.java:252 a little confused why this doesn't use the onDiskDataSizeWithHeader member... src/main/java/org/apache/hadoop/hbase/io/hfile/HFileBlock.java:629 typo: incudes src/main/java/org/apache/hadoop/hbase/io/hfile/HFileBlock.java:751 It would be nice to reuse the code in Hadoop trunk here - see DataChecksum.verifyChunkedSums for example. The benefit is that we have JNI implementations using SSE code there. Only downside is that the JNI code requires direct byte buffers, which I guess we aren't using here... perhaps punt to a future improvement. src/main/java/org/apache/hadoop/hbase/io/hfile/HFileBlock.java:774 I assume this will move to a trace level debug or something? src/main/java/org/apache/hadoop/hbase/io/hfile/HFileBlock.java:1634 I think rather than returning null it makes more sense to throw a ChecksumException here src/main/java/org/apache/hadoop/hbase/io/hfile/HFileBlock.java:1401-1402 I don't know this area of the code well – is it supposed to be thread-safe? This lazy-initialization pattern is not. src/main/java/org/apache/hadoop/hbase/io/hfile/HFileBlock.java:1605 can we just recurse here? src/main/java/org/apache/hadoop/hbase/io/hfile/HFileBlock.java:1896 it's not possible to get at the file path from this context, is it? src/main/java/org/apache/hadoop/hbase/io/hfile/HFileBlock.java:1888 would be nice to reuse Hadoop code here if possible for performance src/main/java/org/apache/hadoop/hbase/io/hfile/HFilePrettyPrinter.java:206-207 this might result in "Wrong FS" if the default FS doesn't match the path provided src/main/java/org/apache/hadoop/hbase/io/hfile/HFileReaderV2.java:1072 can you include the path in the msg? src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java:3546-3547 this code should be passing the path as well to avoid "Wrong FS" src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java:3597 same - default FS may not match the hbase rootdir FS. Maybe RegionServerServices should expose a getHFilesystem() call? src/main/java/org/apache/hadoop/hbase/regionserver/handler/OpenRegionHandler.java:335 this downcast would be avoided by adding getFileSystem to RegionServerServices above src/main/java/org/apache/hadoop/hbase/util/ChecksumByteArrayOutputStream.java:31 could you reuse o.a.h.io.DataOutputBuffer instead of making a new class? src/main/java/org/apache/hadoop/hbase/util/ChecksumFactory.java:39 I'd strongly recommend starting with an implementation of CRC32C (castagnioli polynomial) instead of the Zip polynomial - reason being that there is a hardware implementation in SSE4.2. You can rip the pure-java implementation from Hadoop trunk (PureJavaCrc32C) into HBase. Failing that, we need to add hfile metadata which specifies the checksum algorithm in addition to the checksum type. You could reuse the DataChecksum class from Hadoop there - it encapsulates (type, bytesPerChecksum, checksumSize) src/main/java/org/apache/hadoop/hbase/util/ChecksumFactory.java:65-67 this reflection based method is going to be horribly slow src/main/java/org/apache/hadoop/hbase/util/HFileSystem.java:59 typo, operation. src/main/java/org/apache/hadoop/hbase/util/HFileSystem.java:54 the ctor should take a Path or URI indicating the filesystem, rather than always using the default - same "wrong fs" issue as above src/main/java/org/apache/hadoop/hbase/util/HFileSystem.java:67 typo: in->is src/main/java/org/apache/hadoop/hbase/util/HFileSystem.java:132-133 this is rarely the right call REVISION DETAIL https://reviews.facebook.net/D1521
          Hide
          Phabricator added a comment -

          mbautin has commented on the revision "[jira] HBASE-5074 Support checksums in HBase block cache".

          @dhruba: thanks for the update! See my replies inline.

          INLINE COMMENTS
          src/main/java/org/apache/hadoop/hbase/io/hfile/AbstractHFileReader.java:357 I don't see any overrides of this method in HFileReaderV

          {1,2}

          in the patch, and this particular method looks really confusing, since it takes a parameter, ignores it, and returns this.hfs instead. Did you mean to override it in a way that does use the parameter? In that case, could you please add a javadoc here explaining why the argument is being ignored?
          src/main/java/org/apache/hadoop/hbase/io/hfile/HFile.java:402 Agreed. Perhaps we should avoid replacing all occurrences of FileSystem with HFileSystem. One class cast is much simpler.
          src/main/java/org/apache/hadoop/hbase/io/hfile/HFile.java:417 OK.
          src/main/java/org/apache/hadoop/hbase/io/hfile/HFileBlock.java:123 How much more work is it to make it configurable? Otherwise we would be storing the bytes-per-checksum field but not actually using it, which would be really confusing.
          src/main/java/org/apache/hadoop/hbase/io/hfile/HFileBlock.java:149 Sounds good. Could you replace comments with javadocs? That seems to be the convention in HBase code even for private fields.
          src/main/java/org/apache/hadoop/hbase/io/hfile/HFileBlock.java:441-442 OK, sounds good. I probably just misread the code.
          src/main/java/org/apache/hadoop/hbase/io/hfile/HFileBlock.java:788 It is fine to leave duplicate code between DataInputStream and ByteBuffer implementations for performance reasons. However, I still think it is better to move these into a separate utility class, e.g. ByteBufferUtils.
          src/main/java/org/apache/hadoop/hbase/io/hfile/HFileBlock.java:866-870 Sounds good.
          src/main/java/org/apache/hadoop/hbase/io/hfile/HFileBlock.java:1525 This is probably an error, not a warning, as we are about to shut down the regionserver.
          src/main/java/org/apache/hadoop/hbase/io/hfile/HFileBlock.java:1520-1527 This constructor will be called in the same thread that tries to read the block (see ThreadLocal.get() implementation). I am not sure if throwing a RuntimeException will shut down the regionserver. But this type of error definitely too serious to recover from gracefully, so this is probably fine.

          Just to make sure: are we planning to swap checksum implementations in production? In that case, most RPC threads will still keep their associated PrefetchedHeader instance with the wrong checksum class.
          src/main/java/org/apache/hadoop/hbase/io/hfile/HFileBlock.java:1871 Sounds good. In that case it is probably better to add a method call to an external utility method here, instead of putting checksum calculation inline.

          REVISION DETAIL
          https://reviews.facebook.net/D1521

          Show
          Phabricator added a comment - mbautin has commented on the revision " [jira] HBASE-5074 Support checksums in HBase block cache". @dhruba: thanks for the update! See my replies inline. INLINE COMMENTS src/main/java/org/apache/hadoop/hbase/io/hfile/AbstractHFileReader.java:357 I don't see any overrides of this method in HFileReaderV {1,2} in the patch, and this particular method looks really confusing, since it takes a parameter, ignores it, and returns this.hfs instead. Did you mean to override it in a way that does use the parameter? In that case, could you please add a javadoc here explaining why the argument is being ignored? src/main/java/org/apache/hadoop/hbase/io/hfile/HFile.java:402 Agreed. Perhaps we should avoid replacing all occurrences of FileSystem with HFileSystem. One class cast is much simpler. src/main/java/org/apache/hadoop/hbase/io/hfile/HFile.java:417 OK. src/main/java/org/apache/hadoop/hbase/io/hfile/HFileBlock.java:123 How much more work is it to make it configurable? Otherwise we would be storing the bytes-per-checksum field but not actually using it, which would be really confusing. src/main/java/org/apache/hadoop/hbase/io/hfile/HFileBlock.java:149 Sounds good. Could you replace comments with javadocs? That seems to be the convention in HBase code even for private fields. src/main/java/org/apache/hadoop/hbase/io/hfile/HFileBlock.java:441-442 OK, sounds good. I probably just misread the code. src/main/java/org/apache/hadoop/hbase/io/hfile/HFileBlock.java:788 It is fine to leave duplicate code between DataInputStream and ByteBuffer implementations for performance reasons. However, I still think it is better to move these into a separate utility class, e.g. ByteBufferUtils. src/main/java/org/apache/hadoop/hbase/io/hfile/HFileBlock.java:866-870 Sounds good. src/main/java/org/apache/hadoop/hbase/io/hfile/HFileBlock.java:1525 This is probably an error, not a warning, as we are about to shut down the regionserver. src/main/java/org/apache/hadoop/hbase/io/hfile/HFileBlock.java:1520-1527 This constructor will be called in the same thread that tries to read the block (see ThreadLocal.get() implementation). I am not sure if throwing a RuntimeException will shut down the regionserver. But this type of error definitely too serious to recover from gracefully, so this is probably fine. Just to make sure: are we planning to swap checksum implementations in production? In that case, most RPC threads will still keep their associated PrefetchedHeader instance with the wrong checksum class. src/main/java/org/apache/hadoop/hbase/io/hfile/HFileBlock.java:1871 Sounds good. In that case it is probably better to add a method call to an external utility method here, instead of putting checksum calculation inline. REVISION DETAIL https://reviews.facebook.net/D1521
          Hide
          Phabricator added a comment -

          mbautin has commented on the revision "[jira] HBASE-5074 Support checksums in HBase block cache".

          @dhruba: thanks for the update! See my replies inline.

          INLINE COMMENTS
          src/main/java/org/apache/hadoop/hbase/io/hfile/AbstractHFileReader.java:357 I don't see any overrides of this method in HFileReaderV

          {1,2}

          in the patch, and this particular method looks really confusing, since it takes a parameter, ignores it, and returns this.hfs instead. Did you mean to override it in a way that does use the parameter? In that case, could you please add a javadoc here explaining why the argument is being ignored?
          src/main/java/org/apache/hadoop/hbase/io/hfile/HFile.java:402 Agreed. Perhaps we should avoid replacing all occurrences of FileSystem with HFileSystem. One class cast is much simpler.
          src/main/java/org/apache/hadoop/hbase/io/hfile/HFile.java:417 OK.
          src/main/java/org/apache/hadoop/hbase/io/hfile/HFileBlock.java:123 How much more work is it to make it configurable? Otherwise we would be storing the bytes-per-checksum field but not actually using it, which would be really confusing.
          src/main/java/org/apache/hadoop/hbase/io/hfile/HFileBlock.java:149 Sounds good. Could you replace comments with javadocs? That seems to be the convention in HBase code even for private fields.
          src/main/java/org/apache/hadoop/hbase/io/hfile/HFileBlock.java:441-442 OK, sounds good. I probably just misread the code.
          src/main/java/org/apache/hadoop/hbase/io/hfile/HFileBlock.java:788 It is fine to leave duplicate code between DataInputStream and ByteBuffer implementations for performance reasons. However, I still think it is better to move these into a separate utility class, e.g. ByteBufferUtils.
          src/main/java/org/apache/hadoop/hbase/io/hfile/HFileBlock.java:866-870 Sounds good.
          src/main/java/org/apache/hadoop/hbase/io/hfile/HFileBlock.java:1525 This is probably an error, not a warning, as we are about to shut down the regionserver.
          src/main/java/org/apache/hadoop/hbase/io/hfile/HFileBlock.java:1520-1527 This constructor will be called in the same thread that tries to read the block (see ThreadLocal.get() implementation). I am not sure if throwing a RuntimeException will shut down the regionserver. But this type of error definitely too serious to recover from gracefully, so this is probably fine.

          Just to make sure: are we planning to swap checksum implementations in production? In that case, most RPC threads will still keep their associated PrefetchedHeader instance with the wrong checksum class.
          src/main/java/org/apache/hadoop/hbase/io/hfile/HFileBlock.java:1871 Sounds good. In that case it is probably better to add a method call to an external utility method here, instead of putting checksum calculation inline.

          REVISION DETAIL
          https://reviews.facebook.net/D1521

          Show
          Phabricator added a comment - mbautin has commented on the revision " [jira] HBASE-5074 Support checksums in HBase block cache". @dhruba: thanks for the update! See my replies inline. INLINE COMMENTS src/main/java/org/apache/hadoop/hbase/io/hfile/AbstractHFileReader.java:357 I don't see any overrides of this method in HFileReaderV {1,2} in the patch, and this particular method looks really confusing, since it takes a parameter, ignores it, and returns this.hfs instead. Did you mean to override it in a way that does use the parameter? In that case, could you please add a javadoc here explaining why the argument is being ignored? src/main/java/org/apache/hadoop/hbase/io/hfile/HFile.java:402 Agreed. Perhaps we should avoid replacing all occurrences of FileSystem with HFileSystem. One class cast is much simpler. src/main/java/org/apache/hadoop/hbase/io/hfile/HFile.java:417 OK. src/main/java/org/apache/hadoop/hbase/io/hfile/HFileBlock.java:123 How much more work is it to make it configurable? Otherwise we would be storing the bytes-per-checksum field but not actually using it, which would be really confusing. src/main/java/org/apache/hadoop/hbase/io/hfile/HFileBlock.java:149 Sounds good. Could you replace comments with javadocs? That seems to be the convention in HBase code even for private fields. src/main/java/org/apache/hadoop/hbase/io/hfile/HFileBlock.java:441-442 OK, sounds good. I probably just misread the code. src/main/java/org/apache/hadoop/hbase/io/hfile/HFileBlock.java:788 It is fine to leave duplicate code between DataInputStream and ByteBuffer implementations for performance reasons. However, I still think it is better to move these into a separate utility class, e.g. ByteBufferUtils. src/main/java/org/apache/hadoop/hbase/io/hfile/HFileBlock.java:866-870 Sounds good. src/main/java/org/apache/hadoop/hbase/io/hfile/HFileBlock.java:1525 This is probably an error, not a warning, as we are about to shut down the regionserver. src/main/java/org/apache/hadoop/hbase/io/hfile/HFileBlock.java:1520-1527 This constructor will be called in the same thread that tries to read the block (see ThreadLocal.get() implementation). I am not sure if throwing a RuntimeException will shut down the regionserver. But this type of error definitely too serious to recover from gracefully, so this is probably fine. Just to make sure: are we planning to swap checksum implementations in production? In that case, most RPC threads will still keep their associated PrefetchedHeader instance with the wrong checksum class. src/main/java/org/apache/hadoop/hbase/io/hfile/HFileBlock.java:1871 Sounds good. In that case it is probably better to add a method call to an external utility method here, instead of putting checksum calculation inline. REVISION DETAIL https://reviews.facebook.net/D1521
          Hide
          Phabricator added a comment -

          dhruba has commented on the revision "[jira] HBASE-5074 Support checksums in HBase block cache".

          INLINE COMMENTS
          src/main/java/org/apache/hadoop/hbase/io/hfile/HFilePrettyPrinter.java:206-207 will fix
          src/main/java/org/apache/hadoop/hbase/io/hfile/HFileReaderV2.java:1072 sure
          src/main/java/org/apache/hadoop/hbase/HConstants.java:598 I will make this part of the code cleaner. I still am hoping to keep only one knob: whether to verify hbase checksums or not. If hbase checksums is switched on, then hdfs checksums will automatically be switched off. If hbase checksums is configured 'off', then it will automatically switch on hdfs checksums. I feel that the other knobs (e.g. no checksums at all or use both checksums) are not very interesting in any production environment and I would like to keep the code complexity a little lower by avoiding those two combinations. Hope that is ok with you.
          src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java:3597 Good idea, will do
          src/main/java/org/apache/hadoop/hbase/util/ChecksumByteArrayOutputStream.java:31 It tried this, but it needs a few changes, so I anyway landed up with needing my own object wrapper over DataOutputBuffer.
          src/main/java/org/apache/hadoop/hbase/util/ChecksumFactory.java:39 I too feel that we should add the checksum type to the hfileblock header. That will make us future proof to try new checksum algorithms in the future. Will make this change.
          src/main/java/org/apache/hadoop/hbase/util/HFileSystem.java:132-133 This is equivalent to the existing FileSystem.get() and many places in hbase uses this.
          src/main/java/org/apache/hadoop/hbase/util/HFileSystem.java:80 I will make this public so that users can create a HFileSystem object on a non-default path
          src/main/java/org/apache/hadoop/hbase/io/hfile/HFileBlock.java:102 I am making changes here based on mikhial's suggestion too.
          src/main/java/org/apache/hadoop/hbase/io/hfile/HFileBlock.java:229 as you would see, the existing code path that create a HFileBlock usin g this constructor uses it for only in-memory caching, so it never fills up or uses the onDiskDataSizeWithHeader field. But I will set it to what you propose.
          src/main/java/org/apache/hadoop/hbase/io/hfile/HFileBlock.java:252 ondisksizewithheader = ondiskdatasizewithheader + checksum bytes
          src/main/java/org/apache/hadoop/hbase/io/hfile/HFileBlock.java:751 I am in complete agreement with you. I wish I could have used the hadoop trunk code here. Unfortunately, hbase pulls in hadoop 1.0 which does not have this implementation. Another option is to make a copy of this code from hadoop into hbase code, but this has its own set of problems for maintainability. I am hoping that hbase will move to hadoop 2.0 very soon and then we can start the more optimal checksum implementation. Hope that is ok with you.
          src/main/java/org/apache/hadoop/hbase/io/hfile/HFileBlock.java:1401-1402 This needs to be thread safe.
          src/main/java/org/apache/hadoop/hbase/io/hfile/HFileBlock.java:1634 This is an internal method and this error is handled by upper layers (by switching off hbase checksums). So, I am following the paradigm of using Exceptions only when true errors happen; I would like to avoid writing code that generates exceptions in one layer catches them in another layer and handles them. The discussion with Doug Cutting on the hdfs-symlink patch is etched in my mind
          src/main/java/org/apache/hadoop/hbase/io/hfile/HFileBlock.java:1888 I will work (in a later patch) to use bulk checksum verifications, using native code, etc (from hadoop) in a later patch. I would like to keep this patch smaller that what it already is by focussing on the disk format change, compatibility with older versions, etc. The main reason is that most of the hadoop checksum optimizations are only in hadoop 2.0. I am hoping that it is ok with you.

          REVISION DETAIL
          https://reviews.facebook.net/D1521

          Show
          Phabricator added a comment - dhruba has commented on the revision " [jira] HBASE-5074 Support checksums in HBase block cache". INLINE COMMENTS src/main/java/org/apache/hadoop/hbase/io/hfile/HFilePrettyPrinter.java:206-207 will fix src/main/java/org/apache/hadoop/hbase/io/hfile/HFileReaderV2.java:1072 sure src/main/java/org/apache/hadoop/hbase/HConstants.java:598 I will make this part of the code cleaner. I still am hoping to keep only one knob: whether to verify hbase checksums or not. If hbase checksums is switched on, then hdfs checksums will automatically be switched off. If hbase checksums is configured 'off', then it will automatically switch on hdfs checksums. I feel that the other knobs (e.g. no checksums at all or use both checksums) are not very interesting in any production environment and I would like to keep the code complexity a little lower by avoiding those two combinations. Hope that is ok with you. src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java:3597 Good idea, will do src/main/java/org/apache/hadoop/hbase/util/ChecksumByteArrayOutputStream.java:31 It tried this, but it needs a few changes, so I anyway landed up with needing my own object wrapper over DataOutputBuffer. src/main/java/org/apache/hadoop/hbase/util/ChecksumFactory.java:39 I too feel that we should add the checksum type to the hfileblock header. That will make us future proof to try new checksum algorithms in the future. Will make this change. src/main/java/org/apache/hadoop/hbase/util/HFileSystem.java:132-133 This is equivalent to the existing FileSystem.get() and many places in hbase uses this. src/main/java/org/apache/hadoop/hbase/util/HFileSystem.java:80 I will make this public so that users can create a HFileSystem object on a non-default path src/main/java/org/apache/hadoop/hbase/io/hfile/HFileBlock.java:102 I am making changes here based on mikhial's suggestion too. src/main/java/org/apache/hadoop/hbase/io/hfile/HFileBlock.java:229 as you would see, the existing code path that create a HFileBlock usin g this constructor uses it for only in-memory caching, so it never fills up or uses the onDiskDataSizeWithHeader field. But I will set it to what you propose. src/main/java/org/apache/hadoop/hbase/io/hfile/HFileBlock.java:252 ondisksizewithheader = ondiskdatasizewithheader + checksum bytes src/main/java/org/apache/hadoop/hbase/io/hfile/HFileBlock.java:751 I am in complete agreement with you. I wish I could have used the hadoop trunk code here. Unfortunately, hbase pulls in hadoop 1.0 which does not have this implementation. Another option is to make a copy of this code from hadoop into hbase code, but this has its own set of problems for maintainability. I am hoping that hbase will move to hadoop 2.0 very soon and then we can start the more optimal checksum implementation. Hope that is ok with you. src/main/java/org/apache/hadoop/hbase/io/hfile/HFileBlock.java:1401-1402 This needs to be thread safe. src/main/java/org/apache/hadoop/hbase/io/hfile/HFileBlock.java:1634 This is an internal method and this error is handled by upper layers (by switching off hbase checksums). So, I am following the paradigm of using Exceptions only when true errors happen; I would like to avoid writing code that generates exceptions in one layer catches them in another layer and handles them. The discussion with Doug Cutting on the hdfs-symlink patch is etched in my mind src/main/java/org/apache/hadoop/hbase/io/hfile/HFileBlock.java:1888 I will work (in a later patch) to use bulk checksum verifications, using native code, etc (from hadoop) in a later patch. I would like to keep this patch smaller that what it already is by focussing on the disk format change, compatibility with older versions, etc. The main reason is that most of the hadoop checksum optimizations are only in hadoop 2.0. I am hoping that it is ok with you. REVISION DETAIL https://reviews.facebook.net/D1521
          Hide
          Phabricator added a comment -

          dhruba has commented on the revision "[jira] HBASE-5074 Support checksums in HBase block cache".

          INLINE COMMENTS
          src/main/java/org/apache/hadoop/hbase/io/hfile/HFilePrettyPrinter.java:206-207 will fix
          src/main/java/org/apache/hadoop/hbase/io/hfile/HFileReaderV2.java:1072 sure
          src/main/java/org/apache/hadoop/hbase/HConstants.java:598 I will make this part of the code cleaner. I still am hoping to keep only one knob: whether to verify hbase checksums or not. If hbase checksums is switched on, then hdfs checksums will automatically be switched off. If hbase checksums is configured 'off', then it will automatically switch on hdfs checksums. I feel that the other knobs (e.g. no checksums at all or use both checksums) are not very interesting in any production environment and I would like to keep the code complexity a little lower by avoiding those two combinations. Hope that is ok with you.
          src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java:3597 Good idea, will do
          src/main/java/org/apache/hadoop/hbase/util/ChecksumByteArrayOutputStream.java:31 It tried this, but it needs a few changes, so I anyway landed up with needing my own object wrapper over DataOutputBuffer.
          src/main/java/org/apache/hadoop/hbase/util/ChecksumFactory.java:39 I too feel that we should add the checksum type to the hfileblock header. That will make us future proof to try new checksum algorithms in the future. Will make this change.
          src/main/java/org/apache/hadoop/hbase/util/HFileSystem.java:132-133 This is equivalent to the existing FileSystem.get() and many places in hbase uses this.
          src/main/java/org/apache/hadoop/hbase/util/HFileSystem.java:80 I will make this public so that users can create a HFileSystem object on a non-default path
          src/main/java/org/apache/hadoop/hbase/io/hfile/HFileBlock.java:102 I am making changes here based on mikhial's suggestion too.
          src/main/java/org/apache/hadoop/hbase/io/hfile/HFileBlock.java:229 as you would see, the existing code path that create a HFileBlock usin g this constructor uses it for only in-memory caching, so it never fills up or uses the onDiskDataSizeWithHeader field. But I will set it to what you propose.
          src/main/java/org/apache/hadoop/hbase/io/hfile/HFileBlock.java:252 ondisksizewithheader = ondiskdatasizewithheader + checksum bytes
          src/main/java/org/apache/hadoop/hbase/io/hfile/HFileBlock.java:751 I am in complete agreement with you. I wish I could have used the hadoop trunk code here. Unfortunately, hbase pulls in hadoop 1.0 which does not have this implementation. Another option is to make a copy of this code from hadoop into hbase code, but this has its own set of problems for maintainability. I am hoping that hbase will move to hadoop 2.0 very soon and then we can start the more optimal checksum implementation. Hope that is ok with you.
          src/main/java/org/apache/hadoop/hbase/io/hfile/HFileBlock.java:1401-1402 This needs to be thread safe.
          src/main/java/org/apache/hadoop/hbase/io/hfile/HFileBlock.java:1634 This is an internal method and this error is handled by upper layers (by switching off hbase checksums). So, I am following the paradigm of using Exceptions only when true errors happen; I would like to avoid writing code that generates exceptions in one layer catches them in another layer and handles them. The discussion with Doug Cutting on the hdfs-symlink patch is etched in my mind
          src/main/java/org/apache/hadoop/hbase/io/hfile/HFileBlock.java:1888 I will work (in a later patch) to use bulk checksum verifications, using native code, etc (from hadoop) in a later patch. I would like to keep this patch smaller that what it already is by focussing on the disk format change, compatibility with older versions, etc. The main reason is that most of the hadoop checksum optimizations are only in hadoop 2.0. I am hoping that it is ok with you.

          REVISION DETAIL
          https://reviews.facebook.net/D1521

          Show
          Phabricator added a comment - dhruba has commented on the revision " [jira] HBASE-5074 Support checksums in HBase block cache". INLINE COMMENTS src/main/java/org/apache/hadoop/hbase/io/hfile/HFilePrettyPrinter.java:206-207 will fix src/main/java/org/apache/hadoop/hbase/io/hfile/HFileReaderV2.java:1072 sure src/main/java/org/apache/hadoop/hbase/HConstants.java:598 I will make this part of the code cleaner. I still am hoping to keep only one knob: whether to verify hbase checksums or not. If hbase checksums is switched on, then hdfs checksums will automatically be switched off. If hbase checksums is configured 'off', then it will automatically switch on hdfs checksums. I feel that the other knobs (e.g. no checksums at all or use both checksums) are not very interesting in any production environment and I would like to keep the code complexity a little lower by avoiding those two combinations. Hope that is ok with you. src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java:3597 Good idea, will do src/main/java/org/apache/hadoop/hbase/util/ChecksumByteArrayOutputStream.java:31 It tried this, but it needs a few changes, so I anyway landed up with needing my own object wrapper over DataOutputBuffer. src/main/java/org/apache/hadoop/hbase/util/ChecksumFactory.java:39 I too feel that we should add the checksum type to the hfileblock header. That will make us future proof to try new checksum algorithms in the future. Will make this change. src/main/java/org/apache/hadoop/hbase/util/HFileSystem.java:132-133 This is equivalent to the existing FileSystem.get() and many places in hbase uses this. src/main/java/org/apache/hadoop/hbase/util/HFileSystem.java:80 I will make this public so that users can create a HFileSystem object on a non-default path src/main/java/org/apache/hadoop/hbase/io/hfile/HFileBlock.java:102 I am making changes here based on mikhial's suggestion too. src/main/java/org/apache/hadoop/hbase/io/hfile/HFileBlock.java:229 as you would see, the existing code path that create a HFileBlock usin g this constructor uses it for only in-memory caching, so it never fills up or uses the onDiskDataSizeWithHeader field. But I will set it to what you propose. src/main/java/org/apache/hadoop/hbase/io/hfile/HFileBlock.java:252 ondisksizewithheader = ondiskdatasizewithheader + checksum bytes src/main/java/org/apache/hadoop/hbase/io/hfile/HFileBlock.java:751 I am in complete agreement with you. I wish I could have used the hadoop trunk code here. Unfortunately, hbase pulls in hadoop 1.0 which does not have this implementation. Another option is to make a copy of this code from hadoop into hbase code, but this has its own set of problems for maintainability. I am hoping that hbase will move to hadoop 2.0 very soon and then we can start the more optimal checksum implementation. Hope that is ok with you. src/main/java/org/apache/hadoop/hbase/io/hfile/HFileBlock.java:1401-1402 This needs to be thread safe. src/main/java/org/apache/hadoop/hbase/io/hfile/HFileBlock.java:1634 This is an internal method and this error is handled by upper layers (by switching off hbase checksums). So, I am following the paradigm of using Exceptions only when true errors happen; I would like to avoid writing code that generates exceptions in one layer catches them in another layer and handles them. The discussion with Doug Cutting on the hdfs-symlink patch is etched in my mind src/main/java/org/apache/hadoop/hbase/io/hfile/HFileBlock.java:1888 I will work (in a later patch) to use bulk checksum verifications, using native code, etc (from hadoop) in a later patch. I would like to keep this patch smaller that what it already is by focussing on the disk format change, compatibility with older versions, etc. The main reason is that most of the hadoop checksum optimizations are only in hadoop 2.0. I am hoping that it is ok with you. REVISION DETAIL https://reviews.facebook.net/D1521
          Hide
          Phabricator added a comment -

          dhruba updated the revision "[jira] HBASE-5074 Support checksums in HBase block cache".
          Reviewers: mbautin

          Addressed first-level comments from Todd and Mikhail.
          All awesome feedback, thanks a lot folks!

          There are three main things that are not in this patch yet:
          make bytesPerChecksum configurable, add 'checksum type' to the header,
          and work on making AbstractFSReader.getStream()
          thread safe. I will post these three fixes in a day or so.

          REVISION DETAIL
          https://reviews.facebook.net/D1521

          AFFECTED FILES
          src/test/java/org/apache/hadoop/hbase/HBaseTestingUtility.java
          src/test/java/org/apache/hadoop/hbase/HBaseTestCase.java
          src/test/java/org/apache/hadoop/hbase/regionserver/TestSplitTransaction.java
          src/test/java/org/apache/hadoop/hbase/regionserver/TestFSErrorsExposed.java
          src/test/java/org/apache/hadoop/hbase/regionserver/handler/TestCloseRegionHandler.java
          src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestWALReplay.java
          src/test/java/org/apache/hadoop/hbase/regionserver/TestScanWithBloomError.java
          src/test/java/org/apache/hadoop/hbase/regionserver/TestCompoundBloomFilter.java
          src/test/java/org/apache/hadoop/hbase/regionserver/TestStore.java
          src/test/java/org/apache/hadoop/hbase/regionserver/TestStoreFile.java
          src/test/java/org/apache/hadoop/hbase/coprocessor/TestWALObserver.java
          src/test/java/org/apache/hadoop/hbase/io/hfile/TestChecksum.java
          src/test/java/org/apache/hadoop/hbase/io/hfile/TestHFileBlock.java
          src/test/java/org/apache/hadoop/hbase/io/hfile/TestCacheOnWrite.java
          src/test/java/org/apache/hadoop/hbase/io/hfile/TestHFileReaderV1.java
          src/test/java/org/apache/hadoop/hbase/io/hfile/TestFixedFileTrailer.java
          src/test/java/org/apache/hadoop/hbase/io/hfile/CacheTestUtils.java
          src/test/java/org/apache/hadoop/hbase/io/hfile/TestHFileBlockIndex.java
          src/test/java/org/apache/hadoop/hbase/io/hfile/TestHFileWriterV2.java
          src/test/java/org/apache/hadoop/hbase/io/hfile/TestHFileDataBlockEncoder.java
          src/test/java/org/apache/hadoop/hbase/io/hfile/TestHFileBlockCompatibility.java
          src/test/java/org/apache/hadoop/hbase/io/TestHalfStoreFileReader.java
          src/test/java/org/apache/hadoop/hbase/util/TestMergeTable.java
          src/test/java/org/apache/hadoop/hbase/util/MockRegionServerServices.java
          src/main/java/org/apache/hadoop/hbase/HConstants.java
          src/main/java/org/apache/hadoop/hbase/util/HFileSystem.java
          src/main/java/org/apache/hadoop/hbase/util/ChecksumByteArrayOutputStream.java
          src/main/java/org/apache/hadoop/hbase/util/CompoundBloomFilter.java
          src/main/java/org/apache/hadoop/hbase/util/ChecksumFactory.java
          src/main/java/org/apache/hadoop/hbase/regionserver/handler/OpenRegionHandler.java
          src/main/java/org/apache/hadoop/hbase/regionserver/RegionServerServices.java
          src/main/java/org/apache/hadoop/hbase/regionserver/metrics/RegionServerMetrics.java
          src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java
          src/main/java/org/apache/hadoop/hbase/regionserver/Store.java
          src/main/java/org/apache/hadoop/hbase/regionserver/StoreFile.java
          src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServer.java
          src/main/java/org/apache/hadoop/hbase/io/hfile/ChecksumUtil.java
          src/main/java/org/apache/hadoop/hbase/io/hfile/HFileBlock.java
          src/main/java/org/apache/hadoop/hbase/io/hfile/HFilePrettyPrinter.java
          src/main/java/org/apache/hadoop/hbase/io/hfile/HFileDataBlockEncoderImpl.java
          src/main/java/org/apache/hadoop/hbase/io/hfile/HFileReaderV2.java
          src/main/java/org/apache/hadoop/hbase/io/hfile/HFile.java
          src/main/java/org/apache/hadoop/hbase/io/hfile/HFileDataBlockEncoder.java
          src/main/java/org/apache/hadoop/hbase/io/hfile/HFileWriterV2.java
          src/main/java/org/apache/hadoop/hbase/io/hfile/AbstractHFileReader.java
          src/main/java/org/apache/hadoop/hbase/io/hfile/NoOpDataBlockEncoder.java
          src/main/java/org/apache/hadoop/hbase/io/hfile/HFileReaderV1.java
          src/main/java/org/apache/hadoop/hbase/io/hfile/FixedFileTrailer.java
          src/main/java/org/apache/hadoop/hbase/io/hfile/HFileWriterV1.java

          Show
          Phabricator added a comment - dhruba updated the revision " [jira] HBASE-5074 Support checksums in HBase block cache". Reviewers: mbautin Addressed first-level comments from Todd and Mikhail. All awesome feedback, thanks a lot folks! There are three main things that are not in this patch yet: make bytesPerChecksum configurable, add 'checksum type' to the header, and work on making AbstractFSReader.getStream() thread safe. I will post these three fixes in a day or so. REVISION DETAIL https://reviews.facebook.net/D1521 AFFECTED FILES src/test/java/org/apache/hadoop/hbase/HBaseTestingUtility.java src/test/java/org/apache/hadoop/hbase/HBaseTestCase.java src/test/java/org/apache/hadoop/hbase/regionserver/TestSplitTransaction.java src/test/java/org/apache/hadoop/hbase/regionserver/TestFSErrorsExposed.java src/test/java/org/apache/hadoop/hbase/regionserver/handler/TestCloseRegionHandler.java src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestWALReplay.java src/test/java/org/apache/hadoop/hbase/regionserver/TestScanWithBloomError.java src/test/java/org/apache/hadoop/hbase/regionserver/TestCompoundBloomFilter.java src/test/java/org/apache/hadoop/hbase/regionserver/TestStore.java src/test/java/org/apache/hadoop/hbase/regionserver/TestStoreFile.java src/test/java/org/apache/hadoop/hbase/coprocessor/TestWALObserver.java src/test/java/org/apache/hadoop/hbase/io/hfile/TestChecksum.java src/test/java/org/apache/hadoop/hbase/io/hfile/TestHFileBlock.java src/test/java/org/apache/hadoop/hbase/io/hfile/TestCacheOnWrite.java src/test/java/org/apache/hadoop/hbase/io/hfile/TestHFileReaderV1.java src/test/java/org/apache/hadoop/hbase/io/hfile/TestFixedFileTrailer.java src/test/java/org/apache/hadoop/hbase/io/hfile/CacheTestUtils.java src/test/java/org/apache/hadoop/hbase/io/hfile/TestHFileBlockIndex.java src/test/java/org/apache/hadoop/hbase/io/hfile/TestHFileWriterV2.java src/test/java/org/apache/hadoop/hbase/io/hfile/TestHFileDataBlockEncoder.java src/test/java/org/apache/hadoop/hbase/io/hfile/TestHFileBlockCompatibility.java src/test/java/org/apache/hadoop/hbase/io/TestHalfStoreFileReader.java src/test/java/org/apache/hadoop/hbase/util/TestMergeTable.java src/test/java/org/apache/hadoop/hbase/util/MockRegionServerServices.java src/main/java/org/apache/hadoop/hbase/HConstants.java src/main/java/org/apache/hadoop/hbase/util/HFileSystem.java src/main/java/org/apache/hadoop/hbase/util/ChecksumByteArrayOutputStream.java src/main/java/org/apache/hadoop/hbase/util/CompoundBloomFilter.java src/main/java/org/apache/hadoop/hbase/util/ChecksumFactory.java src/main/java/org/apache/hadoop/hbase/regionserver/handler/OpenRegionHandler.java src/main/java/org/apache/hadoop/hbase/regionserver/RegionServerServices.java src/main/java/org/apache/hadoop/hbase/regionserver/metrics/RegionServerMetrics.java src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java src/main/java/org/apache/hadoop/hbase/regionserver/Store.java src/main/java/org/apache/hadoop/hbase/regionserver/StoreFile.java src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServer.java src/main/java/org/apache/hadoop/hbase/io/hfile/ChecksumUtil.java src/main/java/org/apache/hadoop/hbase/io/hfile/HFileBlock.java src/main/java/org/apache/hadoop/hbase/io/hfile/HFilePrettyPrinter.java src/main/java/org/apache/hadoop/hbase/io/hfile/HFileDataBlockEncoderImpl.java src/main/java/org/apache/hadoop/hbase/io/hfile/HFileReaderV2.java src/main/java/org/apache/hadoop/hbase/io/hfile/HFile.java src/main/java/org/apache/hadoop/hbase/io/hfile/HFileDataBlockEncoder.java src/main/java/org/apache/hadoop/hbase/io/hfile/HFileWriterV2.java src/main/java/org/apache/hadoop/hbase/io/hfile/AbstractHFileReader.java src/main/java/org/apache/hadoop/hbase/io/hfile/NoOpDataBlockEncoder.java src/main/java/org/apache/hadoop/hbase/io/hfile/HFileReaderV1.java src/main/java/org/apache/hadoop/hbase/io/hfile/FixedFileTrailer.java src/main/java/org/apache/hadoop/hbase/io/hfile/HFileWriterV1.java
          Hide
          Phabricator added a comment -

          dhruba updated the revision "[jira] HBASE-5074 Support checksums in HBase block cache".
          Reviewers: mbautin

          Addressed first-level comments from Todd and Mikhail.
          All awesome feedback, thanks a lot folks!

          There are three main things that are not in this patch yet:
          make bytesPerChecksum configurable, add 'checksum type' to the header,
          and work on making AbstractFSReader.getStream()
          thread safe. I will post these three fixes in a day or so.

          REVISION DETAIL
          https://reviews.facebook.net/D1521

          AFFECTED FILES
          src/test/java/org/apache/hadoop/hbase/HBaseTestingUtility.java
          src/test/java/org/apache/hadoop/hbase/HBaseTestCase.java
          src/test/java/org/apache/hadoop/hbase/regionserver/TestSplitTransaction.java
          src/test/java/org/apache/hadoop/hbase/regionserver/TestFSErrorsExposed.java
          src/test/java/org/apache/hadoop/hbase/regionserver/handler/TestCloseRegionHandler.java
          src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestWALReplay.java
          src/test/java/org/apache/hadoop/hbase/regionserver/TestScanWithBloomError.java
          src/test/java/org/apache/hadoop/hbase/regionserver/TestCompoundBloomFilter.java
          src/test/java/org/apache/hadoop/hbase/regionserver/TestStore.java
          src/test/java/org/apache/hadoop/hbase/regionserver/TestStoreFile.java
          src/test/java/org/apache/hadoop/hbase/coprocessor/TestWALObserver.java
          src/test/java/org/apache/hadoop/hbase/io/hfile/TestChecksum.java
          src/test/java/org/apache/hadoop/hbase/io/hfile/TestHFileBlock.java
          src/test/java/org/apache/hadoop/hbase/io/hfile/TestCacheOnWrite.java
          src/test/java/org/apache/hadoop/hbase/io/hfile/TestHFileReaderV1.java
          src/test/java/org/apache/hadoop/hbase/io/hfile/TestFixedFileTrailer.java
          src/test/java/org/apache/hadoop/hbase/io/hfile/CacheTestUtils.java
          src/test/java/org/apache/hadoop/hbase/io/hfile/TestHFileBlockIndex.java
          src/test/java/org/apache/hadoop/hbase/io/hfile/TestHFileWriterV2.java
          src/test/java/org/apache/hadoop/hbase/io/hfile/TestHFileDataBlockEncoder.java
          src/test/java/org/apache/hadoop/hbase/io/hfile/TestHFileBlockCompatibility.java
          src/test/java/org/apache/hadoop/hbase/io/TestHalfStoreFileReader.java
          src/test/java/org/apache/hadoop/hbase/util/TestMergeTable.java
          src/test/java/org/apache/hadoop/hbase/util/MockRegionServerServices.java
          src/main/java/org/apache/hadoop/hbase/HConstants.java
          src/main/java/org/apache/hadoop/hbase/util/HFileSystem.java
          src/main/java/org/apache/hadoop/hbase/util/ChecksumByteArrayOutputStream.java
          src/main/java/org/apache/hadoop/hbase/util/CompoundBloomFilter.java
          src/main/java/org/apache/hadoop/hbase/util/ChecksumFactory.java
          src/main/java/org/apache/hadoop/hbase/regionserver/handler/OpenRegionHandler.java
          src/main/java/org/apache/hadoop/hbase/regionserver/RegionServerServices.java
          src/main/java/org/apache/hadoop/hbase/regionserver/metrics/RegionServerMetrics.java
          src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java
          src/main/java/org/apache/hadoop/hbase/regionserver/Store.java
          src/main/java/org/apache/hadoop/hbase/regionserver/StoreFile.java
          src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServer.java
          src/main/java/org/apache/hadoop/hbase/io/hfile/ChecksumUtil.java
          src/main/java/org/apache/hadoop/hbase/io/hfile/HFileBlock.java
          src/main/java/org/apache/hadoop/hbase/io/hfile/HFilePrettyPrinter.java
          src/main/java/org/apache/hadoop/hbase/io/hfile/HFileDataBlockEncoderImpl.java
          src/main/java/org/apache/hadoop/hbase/io/hfile/HFileReaderV2.java
          src/main/java/org/apache/hadoop/hbase/io/hfile/HFile.java
          src/main/java/org/apache/hadoop/hbase/io/hfile/HFileDataBlockEncoder.java
          src/main/java/org/apache/hadoop/hbase/io/hfile/HFileWriterV2.java
          src/main/java/org/apache/hadoop/hbase/io/hfile/AbstractHFileReader.java
          src/main/java/org/apache/hadoop/hbase/io/hfile/NoOpDataBlockEncoder.java
          src/main/java/org/apache/hadoop/hbase/io/hfile/HFileReaderV1.java
          src/main/java/org/apache/hadoop/hbase/io/hfile/FixedFileTrailer.java
          src/main/java/org/apache/hadoop/hbase/io/hfile/HFileWriterV1.java

          Show
          Phabricator added a comment - dhruba updated the revision " [jira] HBASE-5074 Support checksums in HBase block cache". Reviewers: mbautin Addressed first-level comments from Todd and Mikhail. All awesome feedback, thanks a lot folks! There are three main things that are not in this patch yet: make bytesPerChecksum configurable, add 'checksum type' to the header, and work on making AbstractFSReader.getStream() thread safe. I will post these three fixes in a day or so. REVISION DETAIL https://reviews.facebook.net/D1521 AFFECTED FILES src/test/java/org/apache/hadoop/hbase/HBaseTestingUtility.java src/test/java/org/apache/hadoop/hbase/HBaseTestCase.java src/test/java/org/apache/hadoop/hbase/regionserver/TestSplitTransaction.java src/test/java/org/apache/hadoop/hbase/regionserver/TestFSErrorsExposed.java src/test/java/org/apache/hadoop/hbase/regionserver/handler/TestCloseRegionHandler.java src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestWALReplay.java src/test/java/org/apache/hadoop/hbase/regionserver/TestScanWithBloomError.java src/test/java/org/apache/hadoop/hbase/regionserver/TestCompoundBloomFilter.java src/test/java/org/apache/hadoop/hbase/regionserver/TestStore.java src/test/java/org/apache/hadoop/hbase/regionserver/TestStoreFile.java src/test/java/org/apache/hadoop/hbase/coprocessor/TestWALObserver.java src/test/java/org/apache/hadoop/hbase/io/hfile/TestChecksum.java src/test/java/org/apache/hadoop/hbase/io/hfile/TestHFileBlock.java src/test/java/org/apache/hadoop/hbase/io/hfile/TestCacheOnWrite.java src/test/java/org/apache/hadoop/hbase/io/hfile/TestHFileReaderV1.java src/test/java/org/apache/hadoop/hbase/io/hfile/TestFixedFileTrailer.java src/test/java/org/apache/hadoop/hbase/io/hfile/CacheTestUtils.java src/test/java/org/apache/hadoop/hbase/io/hfile/TestHFileBlockIndex.java src/test/java/org/apache/hadoop/hbase/io/hfile/TestHFileWriterV2.java src/test/java/org/apache/hadoop/hbase/io/hfile/TestHFileDataBlockEncoder.java src/test/java/org/apache/hadoop/hbase/io/hfile/TestHFileBlockCompatibility.java src/test/java/org/apache/hadoop/hbase/io/TestHalfStoreFileReader.java src/test/java/org/apache/hadoop/hbase/util/TestMergeTable.java src/test/java/org/apache/hadoop/hbase/util/MockRegionServerServices.java src/main/java/org/apache/hadoop/hbase/HConstants.java src/main/java/org/apache/hadoop/hbase/util/HFileSystem.java src/main/java/org/apache/hadoop/hbase/util/ChecksumByteArrayOutputStream.java src/main/java/org/apache/hadoop/hbase/util/CompoundBloomFilter.java src/main/java/org/apache/hadoop/hbase/util/ChecksumFactory.java src/main/java/org/apache/hadoop/hbase/regionserver/handler/OpenRegionHandler.java src/main/java/org/apache/hadoop/hbase/regionserver/RegionServerServices.java src/main/java/org/apache/hadoop/hbase/regionserver/metrics/RegionServerMetrics.java src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java src/main/java/org/apache/hadoop/hbase/regionserver/Store.java src/main/java/org/apache/hadoop/hbase/regionserver/StoreFile.java src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServer.java src/main/java/org/apache/hadoop/hbase/io/hfile/ChecksumUtil.java src/main/java/org/apache/hadoop/hbase/io/hfile/HFileBlock.java src/main/java/org/apache/hadoop/hbase/io/hfile/HFilePrettyPrinter.java src/main/java/org/apache/hadoop/hbase/io/hfile/HFileDataBlockEncoderImpl.java src/main/java/org/apache/hadoop/hbase/io/hfile/HFileReaderV2.java src/main/java/org/apache/hadoop/hbase/io/hfile/HFile.java src/main/java/org/apache/hadoop/hbase/io/hfile/HFileDataBlockEncoder.java src/main/java/org/apache/hadoop/hbase/io/hfile/HFileWriterV2.java src/main/java/org/apache/hadoop/hbase/io/hfile/AbstractHFileReader.java src/main/java/org/apache/hadoop/hbase/io/hfile/NoOpDataBlockEncoder.java src/main/java/org/apache/hadoop/hbase/io/hfile/HFileReaderV1.java src/main/java/org/apache/hadoop/hbase/io/hfile/FixedFileTrailer.java src/main/java/org/apache/hadoop/hbase/io/hfile/HFileWriterV1.java
          Hide
          Phabricator added a comment -

          dhruba updated the revision "[jira] HBASE-5074 Support checksums in HBase block cache".
          Reviewers: mbautin

          Many new goodies, thanks to the feedback from Mikhail and Todd. This completes
          my addressing all the current set of review comments. If somebody can re-review it
          again, that will be great.

          1. The bytesPerChecksum is configurable. One can set hbase.hstore.bytes.per.checksum
          in the config to set this. The default value is 16K. Similarly, one can set
          hbase.hstore.checksum.name to either CRC32 or CRC32C. The default is CRC32. If
          PureJavaCRC32 algoritm is available in the classpath, then it is used, otherwise it falls back to using java.util.zip.CRC32. Each checksum value is assumed to be 4 bytes,
          it is currently not configurable (any comments here?). The reflection-method of
          creating checksum objects is reworked to incur much lower overhead.

          2. If a hbase-level crc check fails, then it falls back to using hdfs-level
          checksums for the next few reads (defalts to 100). After that, it will retry
          using hbase-level checksums. I picked 100 as the default so that even in the case
          of continuous hbase-checksum failures, the overhead for additionals iops is limited
          to 1%. Enahnced unit test to validate this behaviour.

          3. Enhanced unit tests to test different sizes of bytesPerChecksum. Also, added
          JMX metrics to record the number of times hbase-checksum verification failures occur.

          REVISION DETAIL
          https://reviews.facebook.net/D1521

          AFFECTED FILES
          src/test/java/org/apache/hadoop/hbase/HBaseTestingUtility.java
          src/test/java/org/apache/hadoop/hbase/HBaseTestCase.java
          src/test/java/org/apache/hadoop/hbase/regionserver/TestSplitTransaction.java
          src/test/java/org/apache/hadoop/hbase/regionserver/CreateRandomStoreFile.java
          src/test/java/org/apache/hadoop/hbase/regionserver/TestFSErrorsExposed.java
          src/test/java/org/apache/hadoop/hbase/regionserver/handler/TestCloseRegionHandler.java
          src/test/java/org/apache/hadoop/hbase/regionserver/HFileReadWriteTest.java
          src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestWALReplay.java
          src/test/java/org/apache/hadoop/hbase/regionserver/TestScanWithBloomError.java
          src/test/java/org/apache/hadoop/hbase/regionserver/TestCompoundBloomFilter.java
          src/test/java/org/apache/hadoop/hbase/regionserver/TestStore.java
          src/test/java/org/apache/hadoop/hbase/regionserver/TestStoreFile.java
          src/test/java/org/apache/hadoop/hbase/coprocessor/TestWALObserver.java
          src/test/java/org/apache/hadoop/hbase/io/hfile/TestHFileBlock.java
          src/test/java/org/apache/hadoop/hbase/io/hfile/TestChecksum.java
          src/test/java/org/apache/hadoop/hbase/io/hfile/TestCacheOnWrite.java
          src/test/java/org/apache/hadoop/hbase/io/hfile/TestHFileReaderV1.java
          src/test/java/org/apache/hadoop/hbase/io/hfile/TestFixedFileTrailer.java
          src/test/java/org/apache/hadoop/hbase/io/hfile/CacheTestUtils.java
          src/test/java/org/apache/hadoop/hbase/io/hfile/TestHFileBlockIndex.java
          src/test/java/org/apache/hadoop/hbase/io/hfile/TestHFileWriterV2.java
          src/test/java/org/apache/hadoop/hbase/io/hfile/TestHFileDataBlockEncoder.java
          src/test/java/org/apache/hadoop/hbase/io/hfile/TestHFileBlockCompatibility.java
          src/test/java/org/apache/hadoop/hbase/io/TestHalfStoreFileReader.java
          src/test/java/org/apache/hadoop/hbase/util/TestMergeTable.java
          src/test/java/org/apache/hadoop/hbase/util/MockRegionServerServices.java
          src/main/java/org/apache/hadoop/hbase/HConstants.java
          src/main/java/org/apache/hadoop/hbase/util/ChecksumType.java
          src/main/java/org/apache/hadoop/hbase/util/HFileSystem.java
          src/main/java/org/apache/hadoop/hbase/util/ChecksumByteArrayOutputStream.java
          src/main/java/org/apache/hadoop/hbase/util/CompoundBloomFilter.java
          src/main/java/org/apache/hadoop/hbase/util/ChecksumFactory.java
          src/main/java/org/apache/hadoop/hbase/regionserver/handler/OpenRegionHandler.java
          src/main/java/org/apache/hadoop/hbase/regionserver/RegionServerServices.java
          src/main/java/org/apache/hadoop/hbase/regionserver/metrics/RegionServerMetrics.java
          src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java
          src/main/java/org/apache/hadoop/hbase/regionserver/StoreFile.java
          src/main/java/org/apache/hadoop/hbase/regionserver/Store.java
          src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServer.java
          src/main/java/org/apache/hadoop/hbase/mapreduce/LoadIncrementalHFiles.java
          src/main/java/org/apache/hadoop/hbase/io/hfile/ChecksumUtil.java
          src/main/java/org/apache/hadoop/hbase/io/hfile/HFileBlock.java
          src/main/java/org/apache/hadoop/hbase/io/hfile/HFilePrettyPrinter.java
          src/main/java/org/apache/hadoop/hbase/io/hfile/HFileDataBlockEncoderImpl.java
          src/main/java/org/apache/hadoop/hbase/io/hfile/HFileReaderV2.java
          src/main/java/org/apache/hadoop/hbase/io/hfile/HFile.java
          src/main/java/org/apache/hadoop/hbase/io/hfile/HFileWriterV2.java
          src/main/java/org/apache/hadoop/hbase/io/hfile/HFileDataBlockEncoder.java
          src/main/java/org/apache/hadoop/hbase/io/hfile/AbstractHFileReader.java
          src/main/java/org/apache/hadoop/hbase/io/hfile/NoOpDataBlockEncoder.java
          src/main/java/org/apache/hadoop/hbase/io/hfile/HFileReaderV1.java
          src/main/java/org/apache/hadoop/hbase/io/hfile/FixedFileTrailer.java
          src/main/java/org/apache/hadoop/hbase/io/hfile/HFileWriterV1.java

          Show
          Phabricator added a comment - dhruba updated the revision " [jira] HBASE-5074 Support checksums in HBase block cache". Reviewers: mbautin Many new goodies, thanks to the feedback from Mikhail and Todd. This completes my addressing all the current set of review comments. If somebody can re-review it again, that will be great. 1. The bytesPerChecksum is configurable. One can set hbase.hstore.bytes.per.checksum in the config to set this. The default value is 16K. Similarly, one can set hbase.hstore.checksum.name to either CRC32 or CRC32C. The default is CRC32. If PureJavaCRC32 algoritm is available in the classpath, then it is used, otherwise it falls back to using java.util.zip.CRC32. Each checksum value is assumed to be 4 bytes, it is currently not configurable (any comments here?). The reflection-method of creating checksum objects is reworked to incur much lower overhead. 2. If a hbase-level crc check fails, then it falls back to using hdfs-level checksums for the next few reads (defalts to 100). After that, it will retry using hbase-level checksums. I picked 100 as the default so that even in the case of continuous hbase-checksum failures, the overhead for additionals iops is limited to 1%. Enahnced unit test to validate this behaviour. 3. Enhanced unit tests to test different sizes of bytesPerChecksum. Also, added JMX metrics to record the number of times hbase-checksum verification failures occur. REVISION DETAIL https://reviews.facebook.net/D1521 AFFECTED FILES src/test/java/org/apache/hadoop/hbase/HBaseTestingUtility.java src/test/java/org/apache/hadoop/hbase/HBaseTestCase.java src/test/java/org/apache/hadoop/hbase/regionserver/TestSplitTransaction.java src/test/java/org/apache/hadoop/hbase/regionserver/CreateRandomStoreFile.java src/test/java/org/apache/hadoop/hbase/regionserver/TestFSErrorsExposed.java src/test/java/org/apache/hadoop/hbase/regionserver/handler/TestCloseRegionHandler.java src/test/java/org/apache/hadoop/hbase/regionserver/HFileReadWriteTest.java src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestWALReplay.java src/test/java/org/apache/hadoop/hbase/regionserver/TestScanWithBloomError.java src/test/java/org/apache/hadoop/hbase/regionserver/TestCompoundBloomFilter.java src/test/java/org/apache/hadoop/hbase/regionserver/TestStore.java src/test/java/org/apache/hadoop/hbase/regionserver/TestStoreFile.java src/test/java/org/apache/hadoop/hbase/coprocessor/TestWALObserver.java src/test/java/org/apache/hadoop/hbase/io/hfile/TestHFileBlock.java src/test/java/org/apache/hadoop/hbase/io/hfile/TestChecksum.java src/test/java/org/apache/hadoop/hbase/io/hfile/TestCacheOnWrite.java src/test/java/org/apache/hadoop/hbase/io/hfile/TestHFileReaderV1.java src/test/java/org/apache/hadoop/hbase/io/hfile/TestFixedFileTrailer.java src/test/java/org/apache/hadoop/hbase/io/hfile/CacheTestUtils.java src/test/java/org/apache/hadoop/hbase/io/hfile/TestHFileBlockIndex.java src/test/java/org/apache/hadoop/hbase/io/hfile/TestHFileWriterV2.java src/test/java/org/apache/hadoop/hbase/io/hfile/TestHFileDataBlockEncoder.java src/test/java/org/apache/hadoop/hbase/io/hfile/TestHFileBlockCompatibility.java src/test/java/org/apache/hadoop/hbase/io/TestHalfStoreFileReader.java src/test/java/org/apache/hadoop/hbase/util/TestMergeTable.java src/test/java/org/apache/hadoop/hbase/util/MockRegionServerServices.java src/main/java/org/apache/hadoop/hbase/HConstants.java src/main/java/org/apache/hadoop/hbase/util/ChecksumType.java src/main/java/org/apache/hadoop/hbase/util/HFileSystem.java src/main/java/org/apache/hadoop/hbase/util/ChecksumByteArrayOutputStream.java src/main/java/org/apache/hadoop/hbase/util/CompoundBloomFilter.java src/main/java/org/apache/hadoop/hbase/util/ChecksumFactory.java src/main/java/org/apache/hadoop/hbase/regionserver/handler/OpenRegionHandler.java src/main/java/org/apache/hadoop/hbase/regionserver/RegionServerServices.java src/main/java/org/apache/hadoop/hbase/regionserver/metrics/RegionServerMetrics.java src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java src/main/java/org/apache/hadoop/hbase/regionserver/StoreFile.java src/main/java/org/apache/hadoop/hbase/regionserver/Store.java src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServer.java src/main/java/org/apache/hadoop/hbase/mapreduce/LoadIncrementalHFiles.java src/main/java/org/apache/hadoop/hbase/io/hfile/ChecksumUtil.java src/main/java/org/apache/hadoop/hbase/io/hfile/HFileBlock.java src/main/java/org/apache/hadoop/hbase/io/hfile/HFilePrettyPrinter.java src/main/java/org/apache/hadoop/hbase/io/hfile/HFileDataBlockEncoderImpl.java src/main/java/org/apache/hadoop/hbase/io/hfile/HFileReaderV2.java src/main/java/org/apache/hadoop/hbase/io/hfile/HFile.java src/main/java/org/apache/hadoop/hbase/io/hfile/HFileWriterV2.java src/main/java/org/apache/hadoop/hbase/io/hfile/HFileDataBlockEncoder.java src/main/java/org/apache/hadoop/hbase/io/hfile/AbstractHFileReader.java src/main/java/org/apache/hadoop/hbase/io/hfile/NoOpDataBlockEncoder.java src/main/java/org/apache/hadoop/hbase/io/hfile/HFileReaderV1.java src/main/java/org/apache/hadoop/hbase/io/hfile/FixedFileTrailer.java src/main/java/org/apache/hadoop/hbase/io/hfile/HFileWriterV1.java
          Hide
          Phabricator added a comment -

          dhruba updated the revision "[jira] HBASE-5074 Support checksums in HBase block cache".
          Reviewers: mbautin

          Many new goodies, thanks to the feedback from Mikhail and Todd. This completes
          my addressing all the current set of review comments. If somebody can re-review it
          again, that will be great.

          1. The bytesPerChecksum is configurable. One can set hbase.hstore.bytes.per.checksum
          in the config to set this. The default value is 16K. Similarly, one can set
          hbase.hstore.checksum.name to either CRC32 or CRC32C. The default is CRC32. If
          PureJavaCRC32 algoritm is available in the classpath, then it is used, otherwise it falls back to using java.util.zip.CRC32. Each checksum value is assumed to be 4 bytes,
          it is currently not configurable (any comments here?). The reflection-method of
          creating checksum objects is reworked to incur much lower overhead.

          2. If a hbase-level crc check fails, then it falls back to using hdfs-level
          checksums for the next few reads (defalts to 100). After that, it will retry
          using hbase-level checksums. I picked 100 as the default so that even in the case
          of continuous hbase-checksum failures, the overhead for additionals iops is limited
          to 1%. Enahnced unit test to validate this behaviour.

          3. Enhanced unit tests to test different sizes of bytesPerChecksum. Also, added
          JMX metrics to record the number of times hbase-checksum verification failures occur.

          REVISION DETAIL
          https://reviews.facebook.net/D1521

          AFFECTED FILES
          src/test/java/org/apache/hadoop/hbase/HBaseTestingUtility.java
          src/test/java/org/apache/hadoop/hbase/HBaseTestCase.java
          src/test/java/org/apache/hadoop/hbase/regionserver/TestSplitTransaction.java
          src/test/java/org/apache/hadoop/hbase/regionserver/CreateRandomStoreFile.java
          src/test/java/org/apache/hadoop/hbase/regionserver/TestFSErrorsExposed.java
          src/test/java/org/apache/hadoop/hbase/regionserver/handler/TestCloseRegionHandler.java
          src/test/java/org/apache/hadoop/hbase/regionserver/HFileReadWriteTest.java
          src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestWALReplay.java
          src/test/java/org/apache/hadoop/hbase/regionserver/TestScanWithBloomError.java
          src/test/java/org/apache/hadoop/hbase/regionserver/TestCompoundBloomFilter.java
          src/test/java/org/apache/hadoop/hbase/regionserver/TestStore.java
          src/test/java/org/apache/hadoop/hbase/regionserver/TestStoreFile.java
          src/test/java/org/apache/hadoop/hbase/coprocessor/TestWALObserver.java
          src/test/java/org/apache/hadoop/hbase/io/hfile/TestHFileBlock.java
          src/test/java/org/apache/hadoop/hbase/io/hfile/TestChecksum.java
          src/test/java/org/apache/hadoop/hbase/io/hfile/TestCacheOnWrite.java
          src/test/java/org/apache/hadoop/hbase/io/hfile/TestHFileReaderV1.java
          src/test/java/org/apache/hadoop/hbase/io/hfile/TestFixedFileTrailer.java
          src/test/java/org/apache/hadoop/hbase/io/hfile/CacheTestUtils.java
          src/test/java/org/apache/hadoop/hbase/io/hfile/TestHFileBlockIndex.java
          src/test/java/org/apache/hadoop/hbase/io/hfile/TestHFileWriterV2.java
          src/test/java/org/apache/hadoop/hbase/io/hfile/TestHFileDataBlockEncoder.java
          src/test/java/org/apache/hadoop/hbase/io/hfile/TestHFileBlockCompatibility.java
          src/test/java/org/apache/hadoop/hbase/io/TestHalfStoreFileReader.java
          src/test/java/org/apache/hadoop/hbase/util/TestMergeTable.java
          src/test/java/org/apache/hadoop/hbase/util/MockRegionServerServices.java
          src/main/java/org/apache/hadoop/hbase/HConstants.java
          src/main/java/org/apache/hadoop/hbase/util/ChecksumType.java
          src/main/java/org/apache/hadoop/hbase/util/HFileSystem.java
          src/main/java/org/apache/hadoop/hbase/util/ChecksumByteArrayOutputStream.java
          src/main/java/org/apache/hadoop/hbase/util/CompoundBloomFilter.java
          src/main/java/org/apache/hadoop/hbase/util/ChecksumFactory.java
          src/main/java/org/apache/hadoop/hbase/regionserver/handler/OpenRegionHandler.java
          src/main/java/org/apache/hadoop/hbase/regionserver/RegionServerServices.java
          src/main/java/org/apache/hadoop/hbase/regionserver/metrics/RegionServerMetrics.java
          src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java
          src/main/java/org/apache/hadoop/hbase/regionserver/StoreFile.java
          src/main/java/org/apache/hadoop/hbase/regionserver/Store.java
          src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServer.java
          src/main/java/org/apache/hadoop/hbase/mapreduce/LoadIncrementalHFiles.java
          src/main/java/org/apache/hadoop/hbase/io/hfile/ChecksumUtil.java
          src/main/java/org/apache/hadoop/hbase/io/hfile/HFileBlock.java
          src/main/java/org/apache/hadoop/hbase/io/hfile/HFilePrettyPrinter.java
          src/main/java/org/apache/hadoop/hbase/io/hfile/HFileDataBlockEncoderImpl.java
          src/main/java/org/apache/hadoop/hbase/io/hfile/HFileReaderV2.java
          src/main/java/org/apache/hadoop/hbase/io/hfile/HFile.java
          src/main/java/org/apache/hadoop/hbase/io/hfile/HFileWriterV2.java
          src/main/java/org/apache/hadoop/hbase/io/hfile/HFileDataBlockEncoder.java
          src/main/java/org/apache/hadoop/hbase/io/hfile/AbstractHFileReader.java
          src/main/java/org/apache/hadoop/hbase/io/hfile/NoOpDataBlockEncoder.java
          src/main/java/org/apache/hadoop/hbase/io/hfile/HFileReaderV1.java
          src/main/java/org/apache/hadoop/hbase/io/hfile/FixedFileTrailer.java
          src/main/java/org/apache/hadoop/hbase/io/hfile/HFileWriterV1.java

          Show
          Phabricator added a comment - dhruba updated the revision " [jira] HBASE-5074 Support checksums in HBase block cache". Reviewers: mbautin Many new goodies, thanks to the feedback from Mikhail and Todd. This completes my addressing all the current set of review comments. If somebody can re-review it again, that will be great. 1. The bytesPerChecksum is configurable. One can set hbase.hstore.bytes.per.checksum in the config to set this. The default value is 16K. Similarly, one can set hbase.hstore.checksum.name to either CRC32 or CRC32C. The default is CRC32. If PureJavaCRC32 algoritm is available in the classpath, then it is used, otherwise it falls back to using java.util.zip.CRC32. Each checksum value is assumed to be 4 bytes, it is currently not configurable (any comments here?). The reflection-method of creating checksum objects is reworked to incur much lower overhead. 2. If a hbase-level crc check fails, then it falls back to using hdfs-level checksums for the next few reads (defalts to 100). After that, it will retry using hbase-level checksums. I picked 100 as the default so that even in the case of continuous hbase-checksum failures, the overhead for additionals iops is limited to 1%. Enahnced unit test to validate this behaviour. 3. Enhanced unit tests to test different sizes of bytesPerChecksum. Also, added JMX metrics to record the number of times hbase-checksum verification failures occur. REVISION DETAIL https://reviews.facebook.net/D1521 AFFECTED FILES src/test/java/org/apache/hadoop/hbase/HBaseTestingUtility.java src/test/java/org/apache/hadoop/hbase/HBaseTestCase.java src/test/java/org/apache/hadoop/hbase/regionserver/TestSplitTransaction.java src/test/java/org/apache/hadoop/hbase/regionserver/CreateRandomStoreFile.java src/test/java/org/apache/hadoop/hbase/regionserver/TestFSErrorsExposed.java src/test/java/org/apache/hadoop/hbase/regionserver/handler/TestCloseRegionHandler.java src/test/java/org/apache/hadoop/hbase/regionserver/HFileReadWriteTest.java src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestWALReplay.java src/test/java/org/apache/hadoop/hbase/regionserver/TestScanWithBloomError.java src/test/java/org/apache/hadoop/hbase/regionserver/TestCompoundBloomFilter.java src/test/java/org/apache/hadoop/hbase/regionserver/TestStore.java src/test/java/org/apache/hadoop/hbase/regionserver/TestStoreFile.java src/test/java/org/apache/hadoop/hbase/coprocessor/TestWALObserver.java src/test/java/org/apache/hadoop/hbase/io/hfile/TestHFileBlock.java src/test/java/org/apache/hadoop/hbase/io/hfile/TestChecksum.java src/test/java/org/apache/hadoop/hbase/io/hfile/TestCacheOnWrite.java src/test/java/org/apache/hadoop/hbase/io/hfile/TestHFileReaderV1.java src/test/java/org/apache/hadoop/hbase/io/hfile/TestFixedFileTrailer.java src/test/java/org/apache/hadoop/hbase/io/hfile/CacheTestUtils.java src/test/java/org/apache/hadoop/hbase/io/hfile/TestHFileBlockIndex.java src/test/java/org/apache/hadoop/hbase/io/hfile/TestHFileWriterV2.java src/test/java/org/apache/hadoop/hbase/io/hfile/TestHFileDataBlockEncoder.java src/test/java/org/apache/hadoop/hbase/io/hfile/TestHFileBlockCompatibility.java src/test/java/org/apache/hadoop/hbase/io/TestHalfStoreFileReader.java src/test/java/org/apache/hadoop/hbase/util/TestMergeTable.java src/test/java/org/apache/hadoop/hbase/util/MockRegionServerServices.java src/main/java/org/apache/hadoop/hbase/HConstants.java src/main/java/org/apache/hadoop/hbase/util/ChecksumType.java src/main/java/org/apache/hadoop/hbase/util/HFileSystem.java src/main/java/org/apache/hadoop/hbase/util/ChecksumByteArrayOutputStream.java src/main/java/org/apache/hadoop/hbase/util/CompoundBloomFilter.java src/main/java/org/apache/hadoop/hbase/util/ChecksumFactory.java src/main/java/org/apache/hadoop/hbase/regionserver/handler/OpenRegionHandler.java src/main/java/org/apache/hadoop/hbase/regionserver/RegionServerServices.java src/main/java/org/apache/hadoop/hbase/regionserver/metrics/RegionServerMetrics.java src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java src/main/java/org/apache/hadoop/hbase/regionserver/StoreFile.java src/main/java/org/apache/hadoop/hbase/regionserver/Store.java src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServer.java src/main/java/org/apache/hadoop/hbase/mapreduce/LoadIncrementalHFiles.java src/main/java/org/apache/hadoop/hbase/io/hfile/ChecksumUtil.java src/main/java/org/apache/hadoop/hbase/io/hfile/HFileBlock.java src/main/java/org/apache/hadoop/hbase/io/hfile/HFilePrettyPrinter.java src/main/java/org/apache/hadoop/hbase/io/hfile/HFileDataBlockEncoderImpl.java src/main/java/org/apache/hadoop/hbase/io/hfile/HFileReaderV2.java src/main/java/org/apache/hadoop/hbase/io/hfile/HFile.java src/main/java/org/apache/hadoop/hbase/io/hfile/HFileWriterV2.java src/main/java/org/apache/hadoop/hbase/io/hfile/HFileDataBlockEncoder.java src/main/java/org/apache/hadoop/hbase/io/hfile/AbstractHFileReader.java src/main/java/org/apache/hadoop/hbase/io/hfile/NoOpDataBlockEncoder.java src/main/java/org/apache/hadoop/hbase/io/hfile/HFileReaderV1.java src/main/java/org/apache/hadoop/hbase/io/hfile/FixedFileTrailer.java src/main/java/org/apache/hadoop/hbase/io/hfile/HFileWriterV1.java
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12513416/D1521.3.patch
          against trunk revision .

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 76 new or modified tests.

          -1 javadoc. The javadoc tool appears to have generated -133 warning messages.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          -1 findbugs. The patch appears to introduce 161 new Findbugs (version 1.3.9) warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          -1 core tests. The patch failed these unit tests:
          org.apache.hadoop.hbase.mapreduce.TestLoadIncrementalHFilesSplitRecovery
          org.apache.hadoop.hbase.util.TestMergeTool
          org.apache.hadoop.hbase.mapreduce.TestHFileOutputFormat
          org.apache.hadoop.hbase.mapred.TestTableMapReduce
          org.apache.hadoop.hbase.mapreduce.TestLoadIncrementalHFiles
          org.apache.hadoop.hbase.client.TestInstantSchemaChangeSplit
          org.apache.hadoop.hbase.io.hfile.TestHFileBlock
          org.apache.hadoop.hbase.mapreduce.TestImportTsv

          Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/907//testReport/
          Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/907//artifact/trunk/patchprocess/newPatchFindbugsWarnings.html
          Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/907//console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12513416/D1521.3.patch against trunk revision . +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 76 new or modified tests. -1 javadoc. The javadoc tool appears to have generated -133 warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. -1 findbugs. The patch appears to introduce 161 new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed these unit tests: org.apache.hadoop.hbase.mapreduce.TestLoadIncrementalHFilesSplitRecovery org.apache.hadoop.hbase.util.TestMergeTool org.apache.hadoop.hbase.mapreduce.TestHFileOutputFormat org.apache.hadoop.hbase.mapred.TestTableMapReduce org.apache.hadoop.hbase.mapreduce.TestLoadIncrementalHFiles org.apache.hadoop.hbase.client.TestInstantSchemaChangeSplit org.apache.hadoop.hbase.io.hfile.TestHFileBlock org.apache.hadoop.hbase.mapreduce.TestImportTsv Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/907//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/907//artifact/trunk/patchprocess/newPatchFindbugsWarnings.html Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/907//console This message is automatically generated.
          Hide
          Phabricator added a comment -

          tedyu has commented on the revision "[jira] HBASE-5074 Support checksums in HBase block cache".

          INLINE COMMENTS
          src/main/java/org/apache/hadoop/hbase/io/hfile/HFile.java:425 This cast is not safe. See https://builds.apache.org/job/PreCommit-HBASE-Build/907//testReport/org.apache.hadoop.hbase.mapreduce/TestLoadIncrementalHFiles/testSimpleLoad/:

          Caused by: java.lang.ClassCastException: org.apache.hadoop.hdfs.DistributedFileSystem cannot be cast to org.apache.hadoop.hbase.util.HFileSystem
          at org.apache.hadoop.hbase.io.hfile.HFile.createReaderWithEncoding(HFile.java:425)
          at org.apache.hadoop.hbase.io.hfile.HFile.createReader(HFile.java:433)
          at org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles.groupOrSplit(LoadIncrementalHFiles.java:407)
          at org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles$2.call(LoadIncrementalHFiles.java:328)
          at org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles$2.call(LoadIncrementalHFiles.java:326)
          src/main/java/org/apache/hadoop/hbase/io/hfile/HFile.java:160 Should we default to CRC32C ?
          src/main/java/org/apache/hadoop/hbase/util/ChecksumFactory.java:2 No year is needed.
          src/main/java/org/apache/hadoop/hbase/util/ChecksumFactory.java:59 Shall we name this variable ctor ?

          Similar comment applies to other meth variables in this patch.

          REVISION DETAIL
          https://reviews.facebook.net/D1521

          Show
          Phabricator added a comment - tedyu has commented on the revision " [jira] HBASE-5074 Support checksums in HBase block cache". INLINE COMMENTS src/main/java/org/apache/hadoop/hbase/io/hfile/HFile.java:425 This cast is not safe. See https://builds.apache.org/job/PreCommit-HBASE-Build/907//testReport/org.apache.hadoop.hbase.mapreduce/TestLoadIncrementalHFiles/testSimpleLoad/: Caused by: java.lang.ClassCastException: org.apache.hadoop.hdfs.DistributedFileSystem cannot be cast to org.apache.hadoop.hbase.util.HFileSystem at org.apache.hadoop.hbase.io.hfile.HFile.createReaderWithEncoding(HFile.java:425) at org.apache.hadoop.hbase.io.hfile.HFile.createReader(HFile.java:433) at org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles.groupOrSplit(LoadIncrementalHFiles.java:407) at org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles$2.call(LoadIncrementalHFiles.java:328) at org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles$2.call(LoadIncrementalHFiles.java:326) src/main/java/org/apache/hadoop/hbase/io/hfile/HFile.java:160 Should we default to CRC32C ? src/main/java/org/apache/hadoop/hbase/util/ChecksumFactory.java:2 No year is needed. src/main/java/org/apache/hadoop/hbase/util/ChecksumFactory.java:59 Shall we name this variable ctor ? Similar comment applies to other meth variables in this patch. REVISION DETAIL https://reviews.facebook.net/D1521
          Hide
          Phabricator added a comment -

          todd has commented on the revision "[jira] HBASE-5074 Support checksums in HBase block cache".

          I haven't thought about it quite enough, but is there any way to do this without leaking the HFileSystem out to the rest of the code? As Ted pointed out, there are some somewhat public interfaces that will probably get touched by that, and the number of places it has required changes in unrelated test cases seems like a "code smell" to me.

          Maybe this could be a static cache somewhere, that given a FileSystem instance, it maintains the un-checksumed equivalents thereof as weak references? Then the concept would be self-contained within the HFile code, which up til now has been a fairly standalone file format.

          REVISION DETAIL
          https://reviews.facebook.net/D1521

          Show
          Phabricator added a comment - todd has commented on the revision " [jira] HBASE-5074 Support checksums in HBase block cache". I haven't thought about it quite enough, but is there any way to do this without leaking the HFileSystem out to the rest of the code? As Ted pointed out, there are some somewhat public interfaces that will probably get touched by that, and the number of places it has required changes in unrelated test cases seems like a "code smell" to me. Maybe this could be a static cache somewhere, that given a FileSystem instance, it maintains the un-checksumed equivalents thereof as weak references? Then the concept would be self-contained within the HFile code, which up til now has been a fairly standalone file format. REVISION DETAIL https://reviews.facebook.net/D1521
          Hide
          Phabricator added a comment -

          todd has commented on the revision "[jira] HBASE-5074 Support checksums in HBase block cache".

          I haven't thought about it quite enough, but is there any way to do this without leaking the HFileSystem out to the rest of the code? As Ted pointed out, there are some somewhat public interfaces that will probably get touched by that, and the number of places it has required changes in unrelated test cases seems like a "code smell" to me.

          Maybe this could be a static cache somewhere, that given a FileSystem instance, it maintains the un-checksumed equivalents thereof as weak references? Then the concept would be self-contained within the HFile code, which up til now has been a fairly standalone file format.

          REVISION DETAIL
          https://reviews.facebook.net/D1521

          Show
          Phabricator added a comment - todd has commented on the revision " [jira] HBASE-5074 Support checksums in HBase block cache". I haven't thought about it quite enough, but is there any way to do this without leaking the HFileSystem out to the rest of the code? As Ted pointed out, there are some somewhat public interfaces that will probably get touched by that, and the number of places it has required changes in unrelated test cases seems like a "code smell" to me. Maybe this could be a static cache somewhere, that given a FileSystem instance, it maintains the un-checksumed equivalents thereof as weak references? Then the concept would be self-contained within the HFile code, which up til now has been a fairly standalone file format. REVISION DETAIL https://reviews.facebook.net/D1521
          Hide
          Phabricator added a comment -

          mbautin has commented on the revision "[jira] HBASE-5074 Support checksums in HBase block cache".

          @dhruba; thanks for the fixes! Here are some more comments (I still have to go through the last 25% of the new version of the patch).

          INLINE COMMENTS
          src/main/java/org/apache/hadoop/hbase/io/hfile/FixedFileTrailer.java:119 Please address this comment. The javadoc says "major" and the variable name says "minor".
          src/main/java/org/apache/hadoop/hbase/io/hfile/FixedFileTrailer.java:49 Please correct the misspelling.
          src/main/java/org/apache/hadoop/hbase/io/hfile/FixedFileTrailer.java:352 I think this function needs to be renamed to expectAtLeastMajorVersion for clarity
          src/main/java/org/apache/hadoop/hbase/io/hfile/HFileBlock.java:287 I think we should either consistently use the onDiskSizeWithHeader field or get rid of it.
          src/main/java/org/apache/hadoop/hbase/io/hfile/HFileWriterV1.java:220 Please do use a constant instead of "0" here for the minor version.
          src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java:3551 Long line
          src/main/java/org/apache/hadoop/hbase/util/ChecksumType.java:60 This lazy initialization is not thread-safe. This also applies to other enum members below. Can the meth field be initialized on the enum constructor, or do we rely on some classes being loaded by the time this initialization is invoked?
          src/main/java/org/apache/hadoop/hbase/util/ChecksumType.java:63-67 Avoid repeating "org.apache.hadoop.util.PureJavaCrc32" three times in string form
          src/main/java/org/apache/hadoop/hbase/util/ChecksumType.java:74-75 Avoid repeating the "java.util.zip.CRC32" string
          src/main/java/org/apache/hadoop/hbase/util/ChecksumType.java:98-99 Avoid repeating the string
          src/test/java/org/apache/hadoop/hbase/io/hfile/TestChecksum.java:132 Fix indentation
          src/test/java/org/apache/hadoop/hbase/io/hfile/TestChecksum.java:174 Fix indentation
          src/test/java/org/apache/hadoop/hbase/io/hfile/TestChecksum.java:71 Inconsistent formatting: "1024 +980".

          REVISION DETAIL
          https://reviews.facebook.net/D1521

          Show
          Phabricator added a comment - mbautin has commented on the revision " [jira] HBASE-5074 Support checksums in HBase block cache". @dhruba; thanks for the fixes! Here are some more comments (I still have to go through the last 25% of the new version of the patch). INLINE COMMENTS src/main/java/org/apache/hadoop/hbase/io/hfile/FixedFileTrailer.java:119 Please address this comment. The javadoc says "major" and the variable name says "minor". src/main/java/org/apache/hadoop/hbase/io/hfile/FixedFileTrailer.java:49 Please correct the misspelling. src/main/java/org/apache/hadoop/hbase/io/hfile/FixedFileTrailer.java:352 I think this function needs to be renamed to expectAtLeastMajorVersion for clarity src/main/java/org/apache/hadoop/hbase/io/hfile/HFileBlock.java:287 I think we should either consistently use the onDiskSizeWithHeader field or get rid of it. src/main/java/org/apache/hadoop/hbase/io/hfile/HFileWriterV1.java:220 Please do use a constant instead of "0" here for the minor version. src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java:3551 Long line src/main/java/org/apache/hadoop/hbase/util/ChecksumType.java:60 This lazy initialization is not thread-safe. This also applies to other enum members below. Can the meth field be initialized on the enum constructor, or do we rely on some classes being loaded by the time this initialization is invoked? src/main/java/org/apache/hadoop/hbase/util/ChecksumType.java:63-67 Avoid repeating "org.apache.hadoop.util.PureJavaCrc32" three times in string form src/main/java/org/apache/hadoop/hbase/util/ChecksumType.java:74-75 Avoid repeating the "java.util.zip.CRC32" string src/main/java/org/apache/hadoop/hbase/util/ChecksumType.java:98-99 Avoid repeating the string src/test/java/org/apache/hadoop/hbase/io/hfile/TestChecksum.java:132 Fix indentation src/test/java/org/apache/hadoop/hbase/io/hfile/TestChecksum.java:174 Fix indentation src/test/java/org/apache/hadoop/hbase/io/hfile/TestChecksum.java:71 Inconsistent formatting: "1024 +980". REVISION DETAIL https://reviews.facebook.net/D1521
          Hide
          Phabricator added a comment -

          mbautin has commented on the revision "[jira] HBASE-5074 Support checksums in HBase block cache".

          @dhruba; thanks for the fixes! Here are some more comments (I still have to go through the last 25% of the new version of the patch).

          INLINE COMMENTS
          src/main/java/org/apache/hadoop/hbase/io/hfile/FixedFileTrailer.java:119 Please address this comment. The javadoc says "major" and the variable name says "minor".
          src/main/java/org/apache/hadoop/hbase/io/hfile/FixedFileTrailer.java:49 Please correct the misspelling.
          src/main/java/org/apache/hadoop/hbase/io/hfile/FixedFileTrailer.java:352 I think this function needs to be renamed to expectAtLeastMajorVersion for clarity
          src/main/java/org/apache/hadoop/hbase/io/hfile/HFileBlock.java:287 I think we should either consistently use the onDiskSizeWithHeader field or get rid of it.
          src/main/java/org/apache/hadoop/hbase/io/hfile/HFileWriterV1.java:220 Please do use a constant instead of "0" here for the minor version.
          src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java:3551 Long line
          src/main/java/org/apache/hadoop/hbase/util/ChecksumType.java:60 This lazy initialization is not thread-safe. This also applies to other enum members below. Can the meth field be initialized on the enum constructor, or do we rely on some classes being loaded by the time this initialization is invoked?
          src/main/java/org/apache/hadoop/hbase/util/ChecksumType.java:63-67 Avoid repeating "org.apache.hadoop.util.PureJavaCrc32" three times in string form
          src/main/java/org/apache/hadoop/hbase/util/ChecksumType.java:74-75 Avoid repeating the "java.util.zip.CRC32" string
          src/main/java/org/apache/hadoop/hbase/util/ChecksumType.java:98-99 Avoid repeating the string
          src/test/java/org/apache/hadoop/hbase/io/hfile/TestChecksum.java:132 Fix indentation
          src/test/java/org/apache/hadoop/hbase/io/hfile/TestChecksum.java:174 Fix indentation
          src/test/java/org/apache/hadoop/hbase/io/hfile/TestChecksum.java:71 Inconsistent formatting: "1024 +980".

          REVISION DETAIL
          https://reviews.facebook.net/D1521

          Show
          Phabricator added a comment - mbautin has commented on the revision " [jira] HBASE-5074 Support checksums in HBase block cache". @dhruba; thanks for the fixes! Here are some more comments (I still have to go through the last 25% of the new version of the patch). INLINE COMMENTS src/main/java/org/apache/hadoop/hbase/io/hfile/FixedFileTrailer.java:119 Please address this comment. The javadoc says "major" and the variable name says "minor". src/main/java/org/apache/hadoop/hbase/io/hfile/FixedFileTrailer.java:49 Please correct the misspelling. src/main/java/org/apache/hadoop/hbase/io/hfile/FixedFileTrailer.java:352 I think this function needs to be renamed to expectAtLeastMajorVersion for clarity src/main/java/org/apache/hadoop/hbase/io/hfile/HFileBlock.java:287 I think we should either consistently use the onDiskSizeWithHeader field or get rid of it. src/main/java/org/apache/hadoop/hbase/io/hfile/HFileWriterV1.java:220 Please do use a constant instead of "0" here for the minor version. src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java:3551 Long line src/main/java/org/apache/hadoop/hbase/util/ChecksumType.java:60 This lazy initialization is not thread-safe. This also applies to other enum members below. Can the meth field be initialized on the enum constructor, or do we rely on some classes being loaded by the time this initialization is invoked? src/main/java/org/apache/hadoop/hbase/util/ChecksumType.java:63-67 Avoid repeating "org.apache.hadoop.util.PureJavaCrc32" three times in string form src/main/java/org/apache/hadoop/hbase/util/ChecksumType.java:74-75 Avoid repeating the "java.util.zip.CRC32" string src/main/java/org/apache/hadoop/hbase/util/ChecksumType.java:98-99 Avoid repeating the string src/test/java/org/apache/hadoop/hbase/io/hfile/TestChecksum.java:132 Fix indentation src/test/java/org/apache/hadoop/hbase/io/hfile/TestChecksum.java:174 Fix indentation src/test/java/org/apache/hadoop/hbase/io/hfile/TestChecksum.java:71 Inconsistent formatting: "1024 +980". REVISION DETAIL https://reviews.facebook.net/D1521
          Hide
          Phabricator added a comment -

          dhruba has commented on the revision "[jira] HBASE-5074 Support checksums in HBase block cache".

          Todd: I agree with you. It is messy that the HFileSystem interface is leaking out to the unit tests. Instead, inside HFile, I can do something like this when a Reader is created:

          if (!fs instanceof HFileSystem)

          { fs = new HFileSystem(fs); }

          what this means is that users of HFile that already passes in a HFileSystem will get the new behaviour while. HReginServer anyways voluntarily creates HFileSystem before invoking HFile, so it work.

          I did not do this earlier because I thought that 'using reflection' is costly, but on second thoughts the cost is not much because it will be done only once when a new reader is created for the first time. what do you think?

          REVISION DETAIL
          https://reviews.facebook.net/D1521

          Show
          Phabricator added a comment - dhruba has commented on the revision " [jira] HBASE-5074 Support checksums in HBase block cache". Todd: I agree with you. It is messy that the HFileSystem interface is leaking out to the unit tests. Instead, inside HFile, I can do something like this when a Reader is created: if (!fs instanceof HFileSystem) { fs = new HFileSystem(fs); } what this means is that users of HFile that already passes in a HFileSystem will get the new behaviour while. HReginServer anyways voluntarily creates HFileSystem before invoking HFile, so it work. I did not do this earlier because I thought that 'using reflection' is costly, but on second thoughts the cost is not much because it will be done only once when a new reader is created for the first time. what do you think? REVISION DETAIL https://reviews.facebook.net/D1521
          Hide
          Phabricator added a comment -

          dhruba has commented on the revision "[jira] HBASE-5074 Support checksums in HBase block cache".

          Todd: I agree with you. It is messy that the HFileSystem interface is leaking out to the unit tests. Instead, inside HFile, I can do something like this when a Reader is created:

          if (!fs instanceof HFileSystem)

          { fs = new HFileSystem(fs); }

          what this means is that users of HFile that already passes in a HFileSystem will get the new behaviour while. HReginServer anyways voluntarily creates HFileSystem before invoking HFile, so it work.

          I did not do this earlier because I thought that 'using reflection' is costly, but on second thoughts the cost is not much because it will be done only once when a new reader is created for the first time. what do you think?

          REVISION DETAIL
          https://reviews.facebook.net/D1521

          Show
          Phabricator added a comment - dhruba has commented on the revision " [jira] HBASE-5074 Support checksums in HBase block cache". Todd: I agree with you. It is messy that the HFileSystem interface is leaking out to the unit tests. Instead, inside HFile, I can do something like this when a Reader is created: if (!fs instanceof HFileSystem) { fs = new HFileSystem(fs); } what this means is that users of HFile that already passes in a HFileSystem will get the new behaviour while. HReginServer anyways voluntarily creates HFileSystem before invoking HFile, so it work. I did not do this earlier because I thought that 'using reflection' is costly, but on second thoughts the cost is not much because it will be done only once when a new reader is created for the first time. what do you think? REVISION DETAIL https://reviews.facebook.net/D1521
          Hide
          Phabricator added a comment -

          todd has commented on the revision "[jira] HBASE-5074 Support checksums in HBase block cache".

          Yea, I think the instanceof check and confining HFileSystem to be only within the hfile package is much better.

          I don't think it should be costly – as you said, it's only when the reader is created, which isn't on the hot code path, and instanceof checks are actually quite fast. They turn into a simple compare of the instance's klassid header against a constant, if I remember correctly.

          REVISION DETAIL
          https://reviews.facebook.net/D1521

          Show
          Phabricator added a comment - todd has commented on the revision " [jira] HBASE-5074 Support checksums in HBase block cache". Yea, I think the instanceof check and confining HFileSystem to be only within the hfile package is much better. I don't think it should be costly – as you said, it's only when the reader is created, which isn't on the hot code path, and instanceof checks are actually quite fast. They turn into a simple compare of the instance's klassid header against a constant, if I remember correctly. REVISION DETAIL https://reviews.facebook.net/D1521
          Hide
          Phabricator added a comment -

          todd has commented on the revision "[jira] HBASE-5074 Support checksums in HBase block cache".

          Yea, I think the instanceof check and confining HFileSystem to be only within the hfile package is much better.

          I don't think it should be costly – as you said, it's only when the reader is created, which isn't on the hot code path, and instanceof checks are actually quite fast. They turn into a simple compare of the instance's klassid header against a constant, if I remember correctly.

          REVISION DETAIL
          https://reviews.facebook.net/D1521

          Show
          Phabricator added a comment - todd has commented on the revision " [jira] HBASE-5074 Support checksums in HBase block cache". Yea, I think the instanceof check and confining HFileSystem to be only within the hfile package is much better. I don't think it should be costly – as you said, it's only when the reader is created, which isn't on the hot code path, and instanceof checks are actually quite fast. They turn into a simple compare of the instance's klassid header against a constant, if I remember correctly. REVISION DETAIL https://reviews.facebook.net/D1521
          Hide
          Phabricator added a comment -

          mbautin has commented on the revision "[jira] HBASE-5074 Support checksums in HBase block cache".

          Some more comments. I am still concerned about the copy-paste stuff in backwards-compatibility checking. Is there a way to minimize that?

          I also mentioned this in the comments below, but it would probably make sense to add more "canned" files in the no-checksum format generated by the old writer and read them with the new reader, the same way HFile v1 compatibility is ensured. I don't mind keeping the old writer code around in the unit test, but I think it is best to remove as much code from that legacy writer as possible (e.g. versatile API, toString, etc.) and only leave the parts necessary to generate the file for testing.

          INLINE COMMENTS
          src/test/java/org/apache/hadoop/hbase/io/hfile/TestHFileBlockCompatibility.java:164 Long line
          src/test/java/org/apache/hadoop/hbase/io/hfile/TestHFileBlockCompatibility.java:83 Can this be made private if it is not accessed outside of this class?
          src/test/java/org/apache/hadoop/hbase/io/hfile/TestHFileBlockCompatibility.java:78 Use ALL_CAPS for constants
          src/test/java/org/apache/hadoop/hbase/io/hfile/TestHFileBlockCompatibility.java:76 There seems to be a lot of copy-and-paste from the old HFileBlock code here. Is there a way to reduce that?

          I think we also need to create some canned old-format HFiles (using the old code) and read them with the new reader code as part of the test.
          src/test/java/org/apache/hadoop/hbase/io/hfile/TestHFileBlockCompatibility.java:365 Make this class final.

          Also, it would make sense to strip this class down as much as possible to maintain the bare minimum of code required to test compatibility (if you have not done that already).
          src/test/java/org/apache/hadoop/hbase/io/hfile/TestHFileBlockCompatibility.java:800 Do we ever use this function?
          src/test/java/org/apache/hadoop/hbase/regionserver/CreateRandomStoreFile.java:188 Is 0 the minor version with no checksums? If so, please replace it with a constant for readability.
          src/test/java/org/apache/hadoop/hbase/regionserver/HFileReadWriteTest.java:356 Is 0 the minor version with no checksums? If so, please replace it with a constant for readability.
          src/test/java/org/apache/hadoop/hbase/regionserver/TestCompoundBloomFilter.java:300 Is 0 the minor version with no checksums? If so, please replace it with a constant for readability.

          REVISION DETAIL
          https://reviews.facebook.net/D1521

          Show
          Phabricator added a comment - mbautin has commented on the revision " [jira] HBASE-5074 Support checksums in HBase block cache". Some more comments. I am still concerned about the copy-paste stuff in backwards-compatibility checking. Is there a way to minimize that? I also mentioned this in the comments below, but it would probably make sense to add more "canned" files in the no-checksum format generated by the old writer and read them with the new reader, the same way HFile v1 compatibility is ensured. I don't mind keeping the old writer code around in the unit test, but I think it is best to remove as much code from that legacy writer as possible (e.g. versatile API, toString, etc.) and only leave the parts necessary to generate the file for testing. INLINE COMMENTS src/test/java/org/apache/hadoop/hbase/io/hfile/TestHFileBlockCompatibility.java:164 Long line src/test/java/org/apache/hadoop/hbase/io/hfile/TestHFileBlockCompatibility.java:83 Can this be made private if it is not accessed outside of this class? src/test/java/org/apache/hadoop/hbase/io/hfile/TestHFileBlockCompatibility.java:78 Use ALL_CAPS for constants src/test/java/org/apache/hadoop/hbase/io/hfile/TestHFileBlockCompatibility.java:76 There seems to be a lot of copy-and-paste from the old HFileBlock code here. Is there a way to reduce that? I think we also need to create some canned old-format HFiles (using the old code) and read them with the new reader code as part of the test. src/test/java/org/apache/hadoop/hbase/io/hfile/TestHFileBlockCompatibility.java:365 Make this class final. Also, it would make sense to strip this class down as much as possible to maintain the bare minimum of code required to test compatibility (if you have not done that already). src/test/java/org/apache/hadoop/hbase/io/hfile/TestHFileBlockCompatibility.java:800 Do we ever use this function? src/test/java/org/apache/hadoop/hbase/regionserver/CreateRandomStoreFile.java:188 Is 0 the minor version with no checksums? If so, please replace it with a constant for readability. src/test/java/org/apache/hadoop/hbase/regionserver/HFileReadWriteTest.java:356 Is 0 the minor version with no checksums? If so, please replace it with a constant for readability. src/test/java/org/apache/hadoop/hbase/regionserver/TestCompoundBloomFilter.java:300 Is 0 the minor version with no checksums? If so, please replace it with a constant for readability. REVISION DETAIL https://reviews.facebook.net/D1521
          Hide
          Phabricator added a comment -

          mbautin has commented on the revision "[jira] HBASE-5074 Support checksums in HBase block cache".

          Some more comments. I am still concerned about the copy-paste stuff in backwards-compatibility checking. Is there a way to minimize that?

          I also mentioned this in the comments below, but it would probably make sense to add more "canned" files in the no-checksum format generated by the old writer and read them with the new reader, the same way HFile v1 compatibility is ensured. I don't mind keeping the old writer code around in the unit test, but I think it is best to remove as much code from that legacy writer as possible (e.g. versatile API, toString, etc.) and only leave the parts necessary to generate the file for testing.

          INLINE COMMENTS
          src/test/java/org/apache/hadoop/hbase/io/hfile/TestHFileBlockCompatibility.java:164 Long line
          src/test/java/org/apache/hadoop/hbase/io/hfile/TestHFileBlockCompatibility.java:83 Can this be made private if it is not accessed outside of this class?
          src/test/java/org/apache/hadoop/hbase/io/hfile/TestHFileBlockCompatibility.java:78 Use ALL_CAPS for constants
          src/test/java/org/apache/hadoop/hbase/io/hfile/TestHFileBlockCompatibility.java:76 There seems to be a lot of copy-and-paste from the old HFileBlock code here. Is there a way to reduce that?

          I think we also need to create some canned old-format HFiles (using the old code) and read them with the new reader code as part of the test.
          src/test/java/org/apache/hadoop/hbase/io/hfile/TestHFileBlockCompatibility.java:365 Make this class final.

          Also, it would make sense to strip this class down as much as possible to maintain the bare minimum of code required to test compatibility (if you have not done that already).
          src/test/java/org/apache/hadoop/hbase/io/hfile/TestHFileBlockCompatibility.java:800 Do we ever use this function?
          src/test/java/org/apache/hadoop/hbase/regionserver/CreateRandomStoreFile.java:188 Is 0 the minor version with no checksums? If so, please replace it with a constant for readability.
          src/test/java/org/apache/hadoop/hbase/regionserver/HFileReadWriteTest.java:356 Is 0 the minor version with no checksums? If so, please replace it with a constant for readability.
          src/test/java/org/apache/hadoop/hbase/regionserver/TestCompoundBloomFilter.java:300 Is 0 the minor version with no checksums? If so, please replace it with a constant for readability.

          REVISION DETAIL
          https://reviews.facebook.net/D1521

          Show
          Phabricator added a comment - mbautin has commented on the revision " [jira] HBASE-5074 Support checksums in HBase block cache". Some more comments. I am still concerned about the copy-paste stuff in backwards-compatibility checking. Is there a way to minimize that? I also mentioned this in the comments below, but it would probably make sense to add more "canned" files in the no-checksum format generated by the old writer and read them with the new reader, the same way HFile v1 compatibility is ensured. I don't mind keeping the old writer code around in the unit test, but I think it is best to remove as much code from that legacy writer as possible (e.g. versatile API, toString, etc.) and only leave the parts necessary to generate the file for testing. INLINE COMMENTS src/test/java/org/apache/hadoop/hbase/io/hfile/TestHFileBlockCompatibility.java:164 Long line src/test/java/org/apache/hadoop/hbase/io/hfile/TestHFileBlockCompatibility.java:83 Can this be made private if it is not accessed outside of this class? src/test/java/org/apache/hadoop/hbase/io/hfile/TestHFileBlockCompatibility.java:78 Use ALL_CAPS for constants src/test/java/org/apache/hadoop/hbase/io/hfile/TestHFileBlockCompatibility.java:76 There seems to be a lot of copy-and-paste from the old HFileBlock code here. Is there a way to reduce that? I think we also need to create some canned old-format HFiles (using the old code) and read them with the new reader code as part of the test. src/test/java/org/apache/hadoop/hbase/io/hfile/TestHFileBlockCompatibility.java:365 Make this class final. Also, it would make sense to strip this class down as much as possible to maintain the bare minimum of code required to test compatibility (if you have not done that already). src/test/java/org/apache/hadoop/hbase/io/hfile/TestHFileBlockCompatibility.java:800 Do we ever use this function? src/test/java/org/apache/hadoop/hbase/regionserver/CreateRandomStoreFile.java:188 Is 0 the minor version with no checksums? If so, please replace it with a constant for readability. src/test/java/org/apache/hadoop/hbase/regionserver/HFileReadWriteTest.java:356 Is 0 the minor version with no checksums? If so, please replace it with a constant for readability. src/test/java/org/apache/hadoop/hbase/regionserver/TestCompoundBloomFilter.java:300 Is 0 the minor version with no checksums? If so, please replace it with a constant for readability. REVISION DETAIL https://reviews.facebook.net/D1521
          Hide
          Phabricator added a comment -

          dhruba has commented on the revision "[jira] HBASE-5074 Support checksums in HBase block cache".

          INLINE COMMENTS
          src/main/java/org/apache/hadoop/hbase/io/hfile/HFileBlock.java:287 can you pl elaborate more on this comment?
          src/test/java/org/apache/hadoop/hbase/io/hfile/TestHFileBlockCompatibility.java:76 I think it is better to keep the compatibility code separate from existing live-test code. That way, it is guaranteed to never change.

          is there any other existing unit test that keeps a version1 file to run unit tests against?
          src/test/java/org/apache/hadoop/hbase/io/hfile/TestHFileBlockCompatibility.java:365 I did not strip it down, just so that it remains as it was earlier. This is for backward-compatibility, so isn't it better to keep as it was?
          src/test/java/org/apache/hadoop/hbase/io/hfile/TestHFileBlockCompatibility.java:800 Was useful while testing, but I will get rid of it.

          REVISION DETAIL
          https://reviews.facebook.net/D1521

          Show
          Phabricator added a comment - dhruba has commented on the revision " [jira] HBASE-5074 Support checksums in HBase block cache". INLINE COMMENTS src/main/java/org/apache/hadoop/hbase/io/hfile/HFileBlock.java:287 can you pl elaborate more on this comment? src/test/java/org/apache/hadoop/hbase/io/hfile/TestHFileBlockCompatibility.java:76 I think it is better to keep the compatibility code separate from existing live-test code. That way, it is guaranteed to never change. is there any other existing unit test that keeps a version1 file to run unit tests against? src/test/java/org/apache/hadoop/hbase/io/hfile/TestHFileBlockCompatibility.java:365 I did not strip it down, just so that it remains as it was earlier. This is for backward-compatibility, so isn't it better to keep as it was? src/test/java/org/apache/hadoop/hbase/io/hfile/TestHFileBlockCompatibility.java:800 Was useful while testing, but I will get rid of it. REVISION DETAIL https://reviews.facebook.net/D1521
          Hide
          Phabricator added a comment -

          dhruba has commented on the revision "[jira] HBASE-5074 Support checksums in HBase block cache".

          INLINE COMMENTS
          src/main/java/org/apache/hadoop/hbase/io/hfile/HFileBlock.java:287 can you pl elaborate more on this comment?
          src/test/java/org/apache/hadoop/hbase/io/hfile/TestHFileBlockCompatibility.java:76 I think it is better to keep the compatibility code separate from existing live-test code. That way, it is guaranteed to never change.

          is there any other existing unit test that keeps a version1 file to run unit tests against?
          src/test/java/org/apache/hadoop/hbase/io/hfile/TestHFileBlockCompatibility.java:365 I did not strip it down, just so that it remains as it was earlier. This is for backward-compatibility, so isn't it better to keep as it was?
          src/test/java/org/apache/hadoop/hbase/io/hfile/TestHFileBlockCompatibility.java:800 Was useful while testing, but I will get rid of it.

          REVISION DETAIL
          https://reviews.facebook.net/D1521

          Show
          Phabricator added a comment - dhruba has commented on the revision " [jira] HBASE-5074 Support checksums in HBase block cache". INLINE COMMENTS src/main/java/org/apache/hadoop/hbase/io/hfile/HFileBlock.java:287 can you pl elaborate more on this comment? src/test/java/org/apache/hadoop/hbase/io/hfile/TestHFileBlockCompatibility.java:76 I think it is better to keep the compatibility code separate from existing live-test code. That way, it is guaranteed to never change. is there any other existing unit test that keeps a version1 file to run unit tests against? src/test/java/org/apache/hadoop/hbase/io/hfile/TestHFileBlockCompatibility.java:365 I did not strip it down, just so that it remains as it was earlier. This is for backward-compatibility, so isn't it better to keep as it was? src/test/java/org/apache/hadoop/hbase/io/hfile/TestHFileBlockCompatibility.java:800 Was useful while testing, but I will get rid of it. REVISION DETAIL https://reviews.facebook.net/D1521
          Hide
          Phabricator added a comment -

          dhruba has commented on the revision "[jira] HBASE-5074 Support checksums in HBase block cache".

          INLINE COMMENTS
          src/main/java/org/apache/hadoop/hbase/io/hfile/HFile.java:160 But CRC32C is not installed by default. You would need hadoop 2.0 (not yet released) to get that.

          REVISION DETAIL
          https://reviews.facebook.net/D1521

          Show
          Phabricator added a comment - dhruba has commented on the revision " [jira] HBASE-5074 Support checksums in HBase block cache". INLINE COMMENTS src/main/java/org/apache/hadoop/hbase/io/hfile/HFile.java:160 But CRC32C is not installed by default. You would need hadoop 2.0 (not yet released) to get that. REVISION DETAIL https://reviews.facebook.net/D1521
          Hide
          Phabricator added a comment -

          dhruba has commented on the revision "[jira] HBASE-5074 Support checksums in HBase block cache".

          INLINE COMMENTS
          src/main/java/org/apache/hadoop/hbase/io/hfile/HFile.java:160 But CRC32C is not installed by default. You would need hadoop 2.0 (not yet released) to get that.

          REVISION DETAIL
          https://reviews.facebook.net/D1521

          Show
          Phabricator added a comment - dhruba has commented on the revision " [jira] HBASE-5074 Support checksums in HBase block cache". INLINE COMMENTS src/main/java/org/apache/hadoop/hbase/io/hfile/HFile.java:160 But CRC32C is not installed by default. You would need hadoop 2.0 (not yet released) to get that. REVISION DETAIL https://reviews.facebook.net/D1521
          Hide
          Phabricator added a comment -

          tedyu has commented on the revision "[jira] HBASE-5074 Support checksums in HBase block cache".

          INLINE COMMENTS
          src/main/java/org/apache/hadoop/hbase/io/hfile/HFile.java:160 I don't see PureJavaCrc32 in hadoop 1.0 either.
          I think it would be nice to default to the best checksum class.
          src/main/java/org/apache/hadoop/hbase/regionserver/Store.java:257 Would hbase.hstore.checksum.algo be a better name for this config parameter ?

          REVISION DETAIL
          https://reviews.facebook.net/D1521

          Show
          Phabricator added a comment - tedyu has commented on the revision " [jira] HBASE-5074 Support checksums in HBase block cache". INLINE COMMENTS src/main/java/org/apache/hadoop/hbase/io/hfile/HFile.java:160 I don't see PureJavaCrc32 in hadoop 1.0 either. I think it would be nice to default to the best checksum class. src/main/java/org/apache/hadoop/hbase/regionserver/Store.java:257 Would hbase.hstore.checksum.algo be a better name for this config parameter ? REVISION DETAIL https://reviews.facebook.net/D1521
          Hide
          Phabricator added a comment -

          tedyu has commented on the revision "[jira] HBASE-5074 Support checksums in HBase block cache".

          INLINE COMMENTS
          src/main/java/org/apache/hadoop/hbase/io/hfile/HFile.java:160 I don't see PureJavaCrc32 in hadoop 1.0 either.
          I think it would be nice to default to the best checksum class.
          src/main/java/org/apache/hadoop/hbase/regionserver/Store.java:257 Would hbase.hstore.checksum.algo be a better name for this config parameter ?

          REVISION DETAIL
          https://reviews.facebook.net/D1521

          Show
          Phabricator added a comment - tedyu has commented on the revision " [jira] HBASE-5074 Support checksums in HBase block cache". INLINE COMMENTS src/main/java/org/apache/hadoop/hbase/io/hfile/HFile.java:160 I don't see PureJavaCrc32 in hadoop 1.0 either. I think it would be nice to default to the best checksum class. src/main/java/org/apache/hadoop/hbase/regionserver/Store.java:257 Would hbase.hstore.checksum.algo be a better name for this config parameter ? REVISION DETAIL https://reviews.facebook.net/D1521
          Hide
          Phabricator added a comment -

          dhruba has commented on the revision "[jira] HBASE-5074 Support checksums in HBase block cache".

          INLINE COMMENTS
          src/main/java/org/apache/hadoop/hbase/io/hfile/HFile.java:160 my choice would be to make java's crc32 be the default. PureJavacrc32 is compatible with java's crc32. However, purejavacrc32C is not compatible with either of these.

          Although PureJavaCRC32 is not part of 1.0, if and when you move to hadoop 2.0, you will automatically get the better performant algorithm via Purejavacrc32.

          For the adventurous, one can manually pull in PureJavaCRC32C inot one's own hbase deployment by explicitly setting hbase.hstore.checksum.algorithm to be "CRC32C".

          Does that sound reasonable?
          src/main/java/org/apache/hadoop/hbase/regionserver/Store.java:257 sounds good, will make this change.

          REVISION DETAIL
          https://reviews.facebook.net/D1521

          Show
          Phabricator added a comment - dhruba has commented on the revision " [jira] HBASE-5074 Support checksums in HBase block cache". INLINE COMMENTS src/main/java/org/apache/hadoop/hbase/io/hfile/HFile.java:160 my choice would be to make java's crc32 be the default. PureJavacrc32 is compatible with java's crc32. However, purejavacrc32C is not compatible with either of these. Although PureJavaCRC32 is not part of 1.0, if and when you move to hadoop 2.0, you will automatically get the better performant algorithm via Purejavacrc32. For the adventurous, one can manually pull in PureJavaCRC32C inot one's own hbase deployment by explicitly setting hbase.hstore.checksum.algorithm to be "CRC32C". Does that sound reasonable? src/main/java/org/apache/hadoop/hbase/regionserver/Store.java:257 sounds good, will make this change. REVISION DETAIL https://reviews.facebook.net/D1521
          Hide
          Phabricator added a comment -

          dhruba has commented on the revision "[jira] HBASE-5074 Support checksums in HBase block cache".

          INLINE COMMENTS
          src/main/java/org/apache/hadoop/hbase/io/hfile/HFile.java:160 my choice would be to make java's crc32 be the default. PureJavacrc32 is compatible with java's crc32. However, purejavacrc32C is not compatible with either of these.

          Although PureJavaCRC32 is not part of 1.0, if and when you move to hadoop 2.0, you will automatically get the better performant algorithm via Purejavacrc32.

          For the adventurous, one can manually pull in PureJavaCRC32C inot one's own hbase deployment by explicitly setting hbase.hstore.checksum.algorithm to be "CRC32C".

          Does that sound reasonable?
          src/main/java/org/apache/hadoop/hbase/regionserver/Store.java:257 sounds good, will make this change.

          REVISION DETAIL
          https://reviews.facebook.net/D1521

          Show
          Phabricator added a comment - dhruba has commented on the revision " [jira] HBASE-5074 Support checksums in HBase block cache". INLINE COMMENTS src/main/java/org/apache/hadoop/hbase/io/hfile/HFile.java:160 my choice would be to make java's crc32 be the default. PureJavacrc32 is compatible with java's crc32. However, purejavacrc32C is not compatible with either of these. Although PureJavaCRC32 is not part of 1.0, if and when you move to hadoop 2.0, you will automatically get the better performant algorithm via Purejavacrc32. For the adventurous, one can manually pull in PureJavaCRC32C inot one's own hbase deployment by explicitly setting hbase.hstore.checksum.algorithm to be "CRC32C". Does that sound reasonable? src/main/java/org/apache/hadoop/hbase/regionserver/Store.java:257 sounds good, will make this change. REVISION DETAIL https://reviews.facebook.net/D1521
          Hide
          Phabricator added a comment -

          dhruba has commented on the revision "[jira] HBASE-5074 Support checksums in HBase block cache".

          Ted:I forgot to state that one can change the default checksum algorithm anytime. No disk format upgrade is necessary. Each hfile stores the checksum algorithm that is used to store data inside it. If today u use CRC32 and the tomorrow you change the configuration setting to CRC32C, then new files that are generated (as part of memstore flushes and compactions) will start using CRC32C while older files will continue to be verified via CRC32 algorithm.

          REVISION DETAIL
          https://reviews.facebook.net/D1521

          Show
          Phabricator added a comment - dhruba has commented on the revision " [jira] HBASE-5074 Support checksums in HBase block cache". Ted:I forgot to state that one can change the default checksum algorithm anytime. No disk format upgrade is necessary. Each hfile stores the checksum algorithm that is used to store data inside it. If today u use CRC32 and the tomorrow you change the configuration setting to CRC32C, then new files that are generated (as part of memstore flushes and compactions) will start using CRC32C while older files will continue to be verified via CRC32 algorithm. REVISION DETAIL https://reviews.facebook.net/D1521
          Hide
          Phabricator added a comment -

          dhruba has commented on the revision "[jira] HBASE-5074 Support checksums in HBase block cache".

          Ted:I forgot to state that one can change the default checksum algorithm anytime. No disk format upgrade is necessary. Each hfile stores the checksum algorithm that is used to store data inside it. If today u use CRC32 and the tomorrow you change the configuration setting to CRC32C, then new files that are generated (as part of memstore flushes and compactions) will start using CRC32C while older files will continue to be verified via CRC32 algorithm.

          REVISION DETAIL
          https://reviews.facebook.net/D1521

          Show
          Phabricator added a comment - dhruba has commented on the revision " [jira] HBASE-5074 Support checksums in HBase block cache". Ted:I forgot to state that one can change the default checksum algorithm anytime. No disk format upgrade is necessary. Each hfile stores the checksum algorithm that is used to store data inside it. If today u use CRC32 and the tomorrow you change the configuration setting to CRC32C, then new files that are generated (as part of memstore flushes and compactions) will start using CRC32C while older files will continue to be verified via CRC32 algorithm. REVISION DETAIL https://reviews.facebook.net/D1521
          Hide
          Ted Yu added a comment - - edited

          @Dhruba:
          Your explanation of CRC algorithm selection makes sense.

          Show
          Ted Yu added a comment - - edited @Dhruba: Your explanation of CRC algorithm selection makes sense.
          Hide
          Phabricator added a comment -

          tedyu has commented on the revision "[jira] HBASE-5074 Support checksums in HBase block cache".

          PureJavaCrc32C is marked with @InterfaceStability.Stable and it only depends on java.util.zip.Checksum
          Does it make sense to port it from hadoop trunk ?

          REVISION DETAIL
          https://reviews.facebook.net/D1521

          Show
          Phabricator added a comment - tedyu has commented on the revision " [jira] HBASE-5074 Support checksums in HBase block cache". PureJavaCrc32C is marked with @InterfaceStability.Stable and it only depends on java.util.zip.Checksum Does it make sense to port it from hadoop trunk ? REVISION DETAIL https://reviews.facebook.net/D1521
          Hide
          Phabricator added a comment -

          tedyu has commented on the revision "[jira] HBASE-5074 Support checksums in HBase block cache".

          PureJavaCrc32C is marked with @InterfaceStability.Stable and it only depends on java.util.zip.Checksum
          Does it make sense to port it from hadoop trunk ?

          REVISION DETAIL
          https://reviews.facebook.net/D1521

          Show
          Phabricator added a comment - tedyu has commented on the revision " [jira] HBASE-5074 Support checksums in HBase block cache". PureJavaCrc32C is marked with @InterfaceStability.Stable and it only depends on java.util.zip.Checksum Does it make sense to port it from hadoop trunk ? REVISION DETAIL https://reviews.facebook.net/D1521
          Hide
          Phabricator added a comment -

          mbautin has commented on the revision "[jira] HBASE-5074 Support checksums in HBase block cache".

          INLINE COMMENTS
          src/main/java/org/apache/hadoop/hbase/io/hfile/HFileBlock.java:287 It looks like onDiskDataSizeWithHeader does not include checksum but what this function returns does. Could you please mention that this includes checksum in the javadoc, and preferably also add a comment clarifying how this is different from onDiskDataSizeWithHeader? Otherwise it would be confusing, since the method and the field have very similar names.
          src/test/java/org/apache/hadoop/hbase/io/hfile/TestHFileBlock.java:763 Could you please use a constant instead of 0 for minor version?

          REVISION DETAIL
          https://reviews.facebook.net/D1521

          Show
          Phabricator added a comment - mbautin has commented on the revision " [jira] HBASE-5074 Support checksums in HBase block cache". INLINE COMMENTS src/main/java/org/apache/hadoop/hbase/io/hfile/HFileBlock.java:287 It looks like onDiskDataSizeWithHeader does not include checksum but what this function returns does. Could you please mention that this includes checksum in the javadoc, and preferably also add a comment clarifying how this is different from onDiskDataSizeWithHeader? Otherwise it would be confusing, since the method and the field have very similar names. src/test/java/org/apache/hadoop/hbase/io/hfile/TestHFileBlock.java:763 Could you please use a constant instead of 0 for minor version? REVISION DETAIL https://reviews.facebook.net/D1521
          Hide
          Phabricator added a comment -

          mbautin has commented on the revision "[jira] HBASE-5074 Support checksums in HBase block cache".

          INLINE COMMENTS
          src/main/java/org/apache/hadoop/hbase/io/hfile/HFileBlock.java:287 It looks like onDiskDataSizeWithHeader does not include checksum but what this function returns does. Could you please mention that this includes checksum in the javadoc, and preferably also add a comment clarifying how this is different from onDiskDataSizeWithHeader? Otherwise it would be confusing, since the method and the field have very similar names.
          src/test/java/org/apache/hadoop/hbase/io/hfile/TestHFileBlock.java:763 Could you please use a constant instead of 0 for minor version?

          REVISION DETAIL
          https://reviews.facebook.net/D1521

          Show
          Phabricator added a comment - mbautin has commented on the revision " [jira] HBASE-5074 Support checksums in HBase block cache". INLINE COMMENTS src/main/java/org/apache/hadoop/hbase/io/hfile/HFileBlock.java:287 It looks like onDiskDataSizeWithHeader does not include checksum but what this function returns does. Could you please mention that this includes checksum in the javadoc, and preferably also add a comment clarifying how this is different from onDiskDataSizeWithHeader? Otherwise it would be confusing, since the method and the field have very similar names. src/test/java/org/apache/hadoop/hbase/io/hfile/TestHFileBlock.java:763 Could you please use a constant instead of 0 for minor version? REVISION DETAIL https://reviews.facebook.net/D1521
          Hide
          Phabricator added a comment -

          dhruba updated the revision "[jira] HBASE-5074 Support checksums in HBase block cache".
          Reviewers: mbautin

          Incorporated most of Mikhail's, Ted's and Todd's feedback.

          1. Removed leak of HFileObject from all places outside of hbase.io.hfile.
          Instead use instanceOf inside HFile.createReaderWithEncoding()
          to dynamically decide which filesystem to use.

          2. constructor for ChecksumType is threadsafe

          One un-answered question: I still kept the backward compatibility test
          with the original HFileBlock.Writer. If anybody can point me to an
          existing unit test that tests reading older files, I can do that instead.

          REVISION DETAIL
          https://reviews.facebook.net/D1521

          AFFECTED FILES
          src/test/java/org/apache/hadoop/hbase/HBaseTestingUtility.java
          src/test/java/org/apache/hadoop/hbase/regionserver/TestSplitTransaction.java
          src/test/java/org/apache/hadoop/hbase/regionserver/CreateRandomStoreFile.java
          src/test/java/org/apache/hadoop/hbase/regionserver/handler/TestCloseRegionHandler.java
          src/test/java/org/apache/hadoop/hbase/regionserver/TestFSErrorsExposed.java
          src/test/java/org/apache/hadoop/hbase/regionserver/HFileReadWriteTest.java
          src/test/java/org/apache/hadoop/hbase/regionserver/TestCompoundBloomFilter.java
          src/test/java/org/apache/hadoop/hbase/regionserver/TestStoreFile.java
          src/test/java/org/apache/hadoop/hbase/io/hfile/TestHFileBlock.java
          src/test/java/org/apache/hadoop/hbase/io/hfile/TestChecksum.java
          src/test/java/org/apache/hadoop/hbase/io/hfile/TestCacheOnWrite.java
          src/test/java/org/apache/hadoop/hbase/io/hfile/TestHFileReaderV1.java
          src/test/java/org/apache/hadoop/hbase/io/hfile/TestFixedFileTrailer.java
          src/test/java/org/apache/hadoop/hbase/io/hfile/CacheTestUtils.java
          src/test/java/org/apache/hadoop/hbase/io/hfile/TestHFileBlockIndex.java
          src/test/java/org/apache/hadoop/hbase/io/hfile/TestHFileWriterV2.java
          src/test/java/org/apache/hadoop/hbase/io/hfile/TestHFileDataBlockEncoder.java
          src/test/java/org/apache/hadoop/hbase/io/hfile/TestHFileBlockCompatibility.java
          src/test/java/org/apache/hadoop/hbase/util/MockRegionServerServices.java
          src/main/java/org/apache/hadoop/hbase/HConstants.java
          src/main/java/org/apache/hadoop/hbase/util/ChecksumType.java
          src/main/java/org/apache/hadoop/hbase/util/HFileSystem.java
          src/main/java/org/apache/hadoop/hbase/util/ChecksumByteArrayOutputStream.java
          src/main/java/org/apache/hadoop/hbase/util/CompoundBloomFilter.java
          src/main/java/org/apache/hadoop/hbase/util/ChecksumFactory.java
          src/main/java/org/apache/hadoop/hbase/regionserver/handler/OpenRegionHandler.java
          src/main/java/org/apache/hadoop/hbase/regionserver/RegionServerServices.java
          src/main/java/org/apache/hadoop/hbase/regionserver/metrics/RegionServerMetrics.java
          src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java
          src/main/java/org/apache/hadoop/hbase/regionserver/StoreFile.java
          src/main/java/org/apache/hadoop/hbase/regionserver/Store.java
          src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServer.java
          src/main/java/org/apache/hadoop/hbase/mapreduce/LoadIncrementalHFiles.java
          src/main/java/org/apache/hadoop/hbase/io/hfile/ChecksumUtil.java
          src/main/java/org/apache/hadoop/hbase/io/hfile/HFileBlock.java
          src/main/java/org/apache/hadoop/hbase/io/hfile/HFileDataBlockEncoderImpl.java
          src/main/java/org/apache/hadoop/hbase/io/hfile/HFileReaderV2.java
          src/main/java/org/apache/hadoop/hbase/io/hfile/HFile.java
          src/main/java/org/apache/hadoop/hbase/io/hfile/HFileWriterV2.java
          src/main/java/org/apache/hadoop/hbase/io/hfile/HFileDataBlockEncoder.java
          src/main/java/org/apache/hadoop/hbase/io/hfile/AbstractHFileReader.java
          src/main/java/org/apache/hadoop/hbase/io/hfile/NoOpDataBlockEncoder.java
          src/main/java/org/apache/hadoop/hbase/io/hfile/HFileReaderV1.java
          src/main/java/org/apache/hadoop/hbase/io/hfile/FixedFileTrailer.java
          src/main/java/org/apache/hadoop/hbase/io/hfile/HFileWriterV1.java

          Show
          Phabricator added a comment - dhruba updated the revision " [jira] HBASE-5074 Support checksums in HBase block cache". Reviewers: mbautin Incorporated most of Mikhail's, Ted's and Todd's feedback. 1. Removed leak of HFileObject from all places outside of hbase.io.hfile. Instead use instanceOf inside HFile.createReaderWithEncoding() to dynamically decide which filesystem to use. 2. constructor for ChecksumType is threadsafe One un-answered question: I still kept the backward compatibility test with the original HFileBlock.Writer. If anybody can point me to an existing unit test that tests reading older files, I can do that instead. REVISION DETAIL https://reviews.facebook.net/D1521 AFFECTED FILES src/test/java/org/apache/hadoop/hbase/HBaseTestingUtility.java src/test/java/org/apache/hadoop/hbase/regionserver/TestSplitTransaction.java src/test/java/org/apache/hadoop/hbase/regionserver/CreateRandomStoreFile.java src/test/java/org/apache/hadoop/hbase/regionserver/handler/TestCloseRegionHandler.java src/test/java/org/apache/hadoop/hbase/regionserver/TestFSErrorsExposed.java src/test/java/org/apache/hadoop/hbase/regionserver/HFileReadWriteTest.java src/test/java/org/apache/hadoop/hbase/regionserver/TestCompoundBloomFilter.java src/test/java/org/apache/hadoop/hbase/regionserver/TestStoreFile.java src/test/java/org/apache/hadoop/hbase/io/hfile/TestHFileBlock.java src/test/java/org/apache/hadoop/hbase/io/hfile/TestChecksum.java src/test/java/org/apache/hadoop/hbase/io/hfile/TestCacheOnWrite.java src/test/java/org/apache/hadoop/hbase/io/hfile/TestHFileReaderV1.java src/test/java/org/apache/hadoop/hbase/io/hfile/TestFixedFileTrailer.java src/test/java/org/apache/hadoop/hbase/io/hfile/CacheTestUtils.java src/test/java/org/apache/hadoop/hbase/io/hfile/TestHFileBlockIndex.java src/test/java/org/apache/hadoop/hbase/io/hfile/TestHFileWriterV2.java src/test/java/org/apache/hadoop/hbase/io/hfile/TestHFileDataBlockEncoder.java src/test/java/org/apache/hadoop/hbase/io/hfile/TestHFileBlockCompatibility.java src/test/java/org/apache/hadoop/hbase/util/MockRegionServerServices.java src/main/java/org/apache/hadoop/hbase/HConstants.java src/main/java/org/apache/hadoop/hbase/util/ChecksumType.java src/main/java/org/apache/hadoop/hbase/util/HFileSystem.java src/main/java/org/apache/hadoop/hbase/util/ChecksumByteArrayOutputStream.java src/main/java/org/apache/hadoop/hbase/util/CompoundBloomFilter.java src/main/java/org/apache/hadoop/hbase/util/ChecksumFactory.java src/main/java/org/apache/hadoop/hbase/regionserver/handler/OpenRegionHandler.java src/main/java/org/apache/hadoop/hbase/regionserver/RegionServerServices.java src/main/java/org/apache/hadoop/hbase/regionserver/metrics/RegionServerMetrics.java src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java src/main/java/org/apache/hadoop/hbase/regionserver/StoreFile.java src/main/java/org/apache/hadoop/hbase/regionserver/Store.java src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServer.java src/main/java/org/apache/hadoop/hbase/mapreduce/LoadIncrementalHFiles.java src/main/java/org/apache/hadoop/hbase/io/hfile/ChecksumUtil.java src/main/java/org/apache/hadoop/hbase/io/hfile/HFileBlock.java src/main/java/org/apache/hadoop/hbase/io/hfile/HFileDataBlockEncoderImpl.java src/main/java/org/apache/hadoop/hbase/io/hfile/HFileReaderV2.java src/main/java/org/apache/hadoop/hbase/io/hfile/HFile.java src/main/java/org/apache/hadoop/hbase/io/hfile/HFileWriterV2.java src/main/java/org/apache/hadoop/hbase/io/hfile/HFileDataBlockEncoder.java src/main/java/org/apache/hadoop/hbase/io/hfile/AbstractHFileReader.java src/main/java/org/apache/hadoop/hbase/io/hfile/NoOpDataBlockEncoder.java src/main/java/org/apache/hadoop/hbase/io/hfile/HFileReaderV1.java src/main/java/org/apache/hadoop/hbase/io/hfile/FixedFileTrailer.java src/main/java/org/apache/hadoop/hbase/io/hfile/HFileWriterV1.java
          Hide
          Phabricator added a comment -

          dhruba updated the revision "[jira] HBASE-5074 Support checksums in HBase block cache".
          Reviewers: mbautin

          Incorporated most of Mikhail's, Ted's and Todd's feedback.

          1. Removed leak of HFileObject from all places outside of hbase.io.hfile.
          Instead use instanceOf inside HFile.createReaderWithEncoding()
          to dynamically decide which filesystem to use.

          2. constructor for ChecksumType is threadsafe

          One un-answered question: I still kept the backward compatibility test
          with the original HFileBlock.Writer. If anybody can point me to an
          existing unit test that tests reading older files, I can do that instead.

          REVISION DETAIL
          https://reviews.facebook.net/D1521

          AFFECTED FILES
          src/test/java/org/apache/hadoop/hbase/HBaseTestingUtility.java
          src/test/java/org/apache/hadoop/hbase/regionserver/TestSplitTransaction.java
          src/test/java/org/apache/hadoop/hbase/regionserver/CreateRandomStoreFile.java
          src/test/java/org/apache/hadoop/hbase/regionserver/handler/TestCloseRegionHandler.java
          src/test/java/org/apache/hadoop/hbase/regionserver/TestFSErrorsExposed.java
          src/test/java/org/apache/hadoop/hbase/regionserver/HFileReadWriteTest.java
          src/test/java/org/apache/hadoop/hbase/regionserver/TestCompoundBloomFilter.java
          src/test/java/org/apache/hadoop/hbase/regionserver/TestStoreFile.java
          src/test/java/org/apache/hadoop/hbase/io/hfile/TestHFileBlock.java
          src/test/java/org/apache/hadoop/hbase/io/hfile/TestChecksum.java
          src/test/java/org/apache/hadoop/hbase/io/hfile/TestCacheOnWrite.java
          src/test/java/org/apache/hadoop/hbase/io/hfile/TestHFileReaderV1.java
          src/test/java/org/apache/hadoop/hbase/io/hfile/TestFixedFileTrailer.java
          src/test/java/org/apache/hadoop/hbase/io/hfile/CacheTestUtils.java
          src/test/java/org/apache/hadoop/hbase/io/hfile/TestHFileBlockIndex.java
          src/test/java/org/apache/hadoop/hbase/io/hfile/TestHFileWriterV2.java
          src/test/java/org/apache/hadoop/hbase/io/hfile/TestHFileDataBlockEncoder.java
          src/test/java/org/apache/hadoop/hbase/io/hfile/TestHFileBlockCompatibility.java
          src/test/java/org/apache/hadoop/hbase/util/MockRegionServerServices.java
          src/main/java/org/apache/hadoop/hbase/HConstants.java
          src/main/java/org/apache/hadoop/hbase/util/ChecksumType.java
          src/main/java/org/apache/hadoop/hbase/util/HFileSystem.java
          src/main/java/org/apache/hadoop/hbase/util/ChecksumByteArrayOutputStream.java
          src/main/java/org/apache/hadoop/hbase/util/CompoundBloomFilter.java
          src/main/java/org/apache/hadoop/hbase/util/ChecksumFactory.java
          src/main/java/org/apache/hadoop/hbase/regionserver/handler/OpenRegionHandler.java
          src/main/java/org/apache/hadoop/hbase/regionserver/RegionServerServices.java
          src/main/java/org/apache/hadoop/hbase/regionserver/metrics/RegionServerMetrics.java
          src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java
          src/main/java/org/apache/hadoop/hbase/regionserver/StoreFile.java
          src/main/java/org/apache/hadoop/hbase/regionserver/Store.java
          src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServer.java
          src/main/java/org/apache/hadoop/hbase/mapreduce/LoadIncrementalHFiles.java
          src/main/java/org/apache/hadoop/hbase/io/hfile/ChecksumUtil.java
          src/main/java/org/apache/hadoop/hbase/io/hfile/HFileBlock.java
          src/main/java/org/apache/hadoop/hbase/io/hfile/HFileDataBlockEncoderImpl.java
          src/main/java/org/apache/hadoop/hbase/io/hfile/HFileReaderV2.java
          src/main/java/org/apache/hadoop/hbase/io/hfile/HFile.java
          src/main/java/org/apache/hadoop/hbase/io/hfile/HFileWriterV2.java
          src/main/java/org/apache/hadoop/hbase/io/hfile/HFileDataBlockEncoder.java
          src/main/java/org/apache/hadoop/hbase/io/hfile/AbstractHFileReader.java
          src/main/java/org/apache/hadoop/hbase/io/hfile/NoOpDataBlockEncoder.java
          src/main/java/org/apache/hadoop/hbase/io/hfile/HFileReaderV1.java
          src/main/java/org/apache/hadoop/hbase/io/hfile/FixedFileTrailer.java
          src/main/java/org/apache/hadoop/hbase/io/hfile/HFileWriterV1.java

          Show
          Phabricator added a comment - dhruba updated the revision " [jira] HBASE-5074 Support checksums in HBase block cache". Reviewers: mbautin Incorporated most of Mikhail's, Ted's and Todd's feedback. 1. Removed leak of HFileObject from all places outside of hbase.io.hfile. Instead use instanceOf inside HFile.createReaderWithEncoding() to dynamically decide which filesystem to use. 2. constructor for ChecksumType is threadsafe One un-answered question: I still kept the backward compatibility test with the original HFileBlock.Writer. If anybody can point me to an existing unit test that tests reading older files, I can do that instead. REVISION DETAIL https://reviews.facebook.net/D1521 AFFECTED FILES src/test/java/org/apache/hadoop/hbase/HBaseTestingUtility.java src/test/java/org/apache/hadoop/hbase/regionserver/TestSplitTransaction.java src/test/java/org/apache/hadoop/hbase/regionserver/CreateRandomStoreFile.java src/test/java/org/apache/hadoop/hbase/regionserver/handler/TestCloseRegionHandler.java src/test/java/org/apache/hadoop/hbase/regionserver/TestFSErrorsExposed.java src/test/java/org/apache/hadoop/hbase/regionserver/HFileReadWriteTest.java src/test/java/org/apache/hadoop/hbase/regionserver/TestCompoundBloomFilter.java src/test/java/org/apache/hadoop/hbase/regionserver/TestStoreFile.java src/test/java/org/apache/hadoop/hbase/io/hfile/TestHFileBlock.java src/test/java/org/apache/hadoop/hbase/io/hfile/TestChecksum.java src/test/java/org/apache/hadoop/hbase/io/hfile/TestCacheOnWrite.java src/test/java/org/apache/hadoop/hbase/io/hfile/TestHFileReaderV1.java src/test/java/org/apache/hadoop/hbase/io/hfile/TestFixedFileTrailer.java src/test/java/org/apache/hadoop/hbase/io/hfile/CacheTestUtils.java src/test/java/org/apache/hadoop/hbase/io/hfile/TestHFileBlockIndex.java src/test/java/org/apache/hadoop/hbase/io/hfile/TestHFileWriterV2.java src/test/java/org/apache/hadoop/hbase/io/hfile/TestHFileDataBlockEncoder.java src/test/java/org/apache/hadoop/hbase/io/hfile/TestHFileBlockCompatibility.java src/test/java/org/apache/hadoop/hbase/util/MockRegionServerServices.java src/main/java/org/apache/hadoop/hbase/HConstants.java src/main/java/org/apache/hadoop/hbase/util/ChecksumType.java src/main/java/org/apache/hadoop/hbase/util/HFileSystem.java src/main/java/org/apache/hadoop/hbase/util/ChecksumByteArrayOutputStream.java src/main/java/org/apache/hadoop/hbase/util/CompoundBloomFilter.java src/main/java/org/apache/hadoop/hbase/util/ChecksumFactory.java src/main/java/org/apache/hadoop/hbase/regionserver/handler/OpenRegionHandler.java src/main/java/org/apache/hadoop/hbase/regionserver/RegionServerServices.java src/main/java/org/apache/hadoop/hbase/regionserver/metrics/RegionServerMetrics.java src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java src/main/java/org/apache/hadoop/hbase/regionserver/StoreFile.java src/main/java/org/apache/hadoop/hbase/regionserver/Store.java src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServer.java src/main/java/org/apache/hadoop/hbase/mapreduce/LoadIncrementalHFiles.java src/main/java/org/apache/hadoop/hbase/io/hfile/ChecksumUtil.java src/main/java/org/apache/hadoop/hbase/io/hfile/HFileBlock.java src/main/java/org/apache/hadoop/hbase/io/hfile/HFileDataBlockEncoderImpl.java src/main/java/org/apache/hadoop/hbase/io/hfile/HFileReaderV2.java src/main/java/org/apache/hadoop/hbase/io/hfile/HFile.java src/main/java/org/apache/hadoop/hbase/io/hfile/HFileWriterV2.java src/main/java/org/apache/hadoop/hbase/io/hfile/HFileDataBlockEncoder.java src/main/java/org/apache/hadoop/hbase/io/hfile/AbstractHFileReader.java src/main/java/org/apache/hadoop/hbase/io/hfile/NoOpDataBlockEncoder.java src/main/java/org/apache/hadoop/hbase/io/hfile/HFileReaderV1.java src/main/java/org/apache/hadoop/hbase/io/hfile/FixedFileTrailer.java src/main/java/org/apache/hadoop/hbase/io/hfile/HFileWriterV1.java
          Hide
          Phabricator added a comment -

          mbautin has commented on the revision "[jira] HBASE-5074 Support checksums in HBase block cache".

          @dhruba: keeping the compatibility test is fine with me. We can add a test that reads a "canned" HFile in the old format later.

          REVISION DETAIL
          https://reviews.facebook.net/D1521

          Show
          Phabricator added a comment - mbautin has commented on the revision " [jira] HBASE-5074 Support checksums in HBase block cache". @dhruba: keeping the compatibility test is fine with me. We can add a test that reads a "canned" HFile in the old format later. REVISION DETAIL https://reviews.facebook.net/D1521
          Hide
          Phabricator added a comment -

          mbautin has commented on the revision "[jira] HBASE-5074 Support checksums in HBase block cache".

          @dhruba: keeping the compatibility test is fine with me. We can add a test that reads a "canned" HFile in the old format later.

          REVISION DETAIL
          https://reviews.facebook.net/D1521

          Show
          Phabricator added a comment - mbautin has commented on the revision " [jira] HBASE-5074 Support checksums in HBase block cache". @dhruba: keeping the compatibility test is fine with me. We can add a test that reads a "canned" HFile in the old format later. REVISION DETAIL https://reviews.facebook.net/D1521
          Hide
          Phabricator added a comment -

          tedyu has commented on the revision "[jira] HBASE-5074 Support checksums in HBase block cache".

          INLINE COMMENTS
          src/main/java/org/apache/hadoop/hbase/util/ChecksumType.java:64 Constants are normally spelled in upper cases.
          src/main/java/org/apache/hadoop/hbase/util/ChecksumType.java:66 Should this be lifted to line 38 ?
          src/main/java/org/apache/hadoop/hbase/util/ChecksumType.java:73 e should be included in message.
          src/main/java/org/apache/hadoop/hbase/util/ChecksumType.java:111 We should share the LOG with CRC32.
          src/main/java/org/apache/hadoop/hbase/util/HFileSystem.java:2 Year is not needed.
          src/main/java/org/apache/hadoop/hbase/util/ChecksumByteArrayOutputStream.java:2 Year is not needed.
          src/test/java/org/apache/hadoop/hbase/io/hfile/TestChecksum.java:2 Year is not needed.

          REVISION DETAIL
          https://reviews.facebook.net/D1521

          Show
          Phabricator added a comment - tedyu has commented on the revision " [jira] HBASE-5074 Support checksums in HBase block cache". INLINE COMMENTS src/main/java/org/apache/hadoop/hbase/util/ChecksumType.java:64 Constants are normally spelled in upper cases. src/main/java/org/apache/hadoop/hbase/util/ChecksumType.java:66 Should this be lifted to line 38 ? src/main/java/org/apache/hadoop/hbase/util/ChecksumType.java:73 e should be included in message. src/main/java/org/apache/hadoop/hbase/util/ChecksumType.java:111 We should share the LOG with CRC32. src/main/java/org/apache/hadoop/hbase/util/HFileSystem.java:2 Year is not needed. src/main/java/org/apache/hadoop/hbase/util/ChecksumByteArrayOutputStream.java:2 Year is not needed. src/test/java/org/apache/hadoop/hbase/io/hfile/TestChecksum.java:2 Year is not needed. REVISION DETAIL https://reviews.facebook.net/D1521
          Hide
          Phabricator added a comment -

          tedyu has commented on the revision "[jira] HBASE-5074 Support checksums in HBase block cache".

          INLINE COMMENTS
          src/main/java/org/apache/hadoop/hbase/util/ChecksumType.java:64 Constants are normally spelled in upper cases.
          src/main/java/org/apache/hadoop/hbase/util/ChecksumType.java:66 Should this be lifted to line 38 ?
          src/main/java/org/apache/hadoop/hbase/util/ChecksumType.java:73 e should be included in message.
          src/main/java/org/apache/hadoop/hbase/util/ChecksumType.java:111 We should share the LOG with CRC32.
          src/main/java/org/apache/hadoop/hbase/util/HFileSystem.java:2 Year is not needed.
          src/main/java/org/apache/hadoop/hbase/util/ChecksumByteArrayOutputStream.java:2 Year is not needed.
          src/test/java/org/apache/hadoop/hbase/io/hfile/TestChecksum.java:2 Year is not needed.

          REVISION DETAIL
          https://reviews.facebook.net/D1521

          Show
          Phabricator added a comment - tedyu has commented on the revision " [jira] HBASE-5074 Support checksums in HBase block cache". INLINE COMMENTS src/main/java/org/apache/hadoop/hbase/util/ChecksumType.java:64 Constants are normally spelled in upper cases. src/main/java/org/apache/hadoop/hbase/util/ChecksumType.java:66 Should this be lifted to line 38 ? src/main/java/org/apache/hadoop/hbase/util/ChecksumType.java:73 e should be included in message. src/main/java/org/apache/hadoop/hbase/util/ChecksumType.java:111 We should share the LOG with CRC32. src/main/java/org/apache/hadoop/hbase/util/HFileSystem.java:2 Year is not needed. src/main/java/org/apache/hadoop/hbase/util/ChecksumByteArrayOutputStream.java:2 Year is not needed. src/test/java/org/apache/hadoop/hbase/io/hfile/TestChecksum.java:2 Year is not needed. REVISION DETAIL https://reviews.facebook.net/D1521
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12513715/D1521.4.patch
          against trunk revision .

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 55 new or modified tests.

          -1 javadoc. The javadoc tool appears to have generated -133 warning messages.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          -1 findbugs. The patch appears to introduce 160 new Findbugs (version 1.3.9) warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          -1 core tests. The patch failed these unit tests:
          org.apache.hadoop.hbase.io.hfile.TestHFileBlock
          org.apache.hadoop.hbase.coprocessor.TestClassLoading

          Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/917//testReport/
          Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/917//artifact/trunk/patchprocess/newPatchFindbugsWarnings.html
          Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/917//console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12513715/D1521.4.patch against trunk revision . +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 55 new or modified tests. -1 javadoc. The javadoc tool appears to have generated -133 warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. -1 findbugs. The patch appears to introduce 160 new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed these unit tests: org.apache.hadoop.hbase.io.hfile.TestHFileBlock org.apache.hadoop.hbase.coprocessor.TestClassLoading Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/917//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/917//artifact/trunk/patchprocess/newPatchFindbugsWarnings.html Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/917//console This message is automatically generated.
          Hide
          Phabricator added a comment -

          tedyu has commented on the revision "[jira] HBASE-5074 Support checksums in HBase block cache".

          INLINE COMMENTS
          src/main/java/org/apache/hadoop/hbase/util/ChecksumFactory.java:72 There is no such parameter now.
          src/main/java/org/apache/hadoop/hbase/util/ChecksumFactory.java:55 Would newConstructor be better name ?
          This method doesn't really create a new instance.
          src/main/java/org/apache/hadoop/hbase/util/HFileSystem.java:36 Should read 'An encapsulation'
          src/main/java/org/apache/hadoop/hbase/util/HFileSystem.java:61 Using this.fs would be cleaner.
          src/main/java/org/apache/hadoop/hbase/util/HFileSystem.java:122 Can we make the method name and field name consistent in terms of plurality ?

          REVISION DETAIL
          https://reviews.facebook.net/D1521

          Show
          Phabricator added a comment - tedyu has commented on the revision " [jira] HBASE-5074 Support checksums in HBase block cache". INLINE COMMENTS src/main/java/org/apache/hadoop/hbase/util/ChecksumFactory.java:72 There is no such parameter now. src/main/java/org/apache/hadoop/hbase/util/ChecksumFactory.java:55 Would newConstructor be better name ? This method doesn't really create a new instance. src/main/java/org/apache/hadoop/hbase/util/HFileSystem.java:36 Should read 'An encapsulation' src/main/java/org/apache/hadoop/hbase/util/HFileSystem.java:61 Using this.fs would be cleaner. src/main/java/org/apache/hadoop/hbase/util/HFileSystem.java:122 Can we make the method name and field name consistent in terms of plurality ? REVISION DETAIL https://reviews.facebook.net/D1521
          Hide
          Phabricator added a comment -

          tedyu has commented on the revision "[jira] HBASE-5074 Support checksums in HBase block cache".

          INLINE COMMENTS
          src/main/java/org/apache/hadoop/hbase/util/ChecksumFactory.java:72 There is no such parameter now.
          src/main/java/org/apache/hadoop/hbase/util/ChecksumFactory.java:55 Would newConstructor be better name ?
          This method doesn't really create a new instance.
          src/main/java/org/apache/hadoop/hbase/util/HFileSystem.java:36 Should read 'An encapsulation'
          src/main/java/org/apache/hadoop/hbase/util/HFileSystem.java:61 Using this.fs would be cleaner.
          src/main/java/org/apache/hadoop/hbase/util/HFileSystem.java:122 Can we make the method name and field name consistent in terms of plurality ?

          REVISION DETAIL
          https://reviews.facebook.net/D1521

          Show
          Phabricator added a comment - tedyu has commented on the revision " [jira] HBASE-5074 Support checksums in HBase block cache". INLINE COMMENTS src/main/java/org/apache/hadoop/hbase/util/ChecksumFactory.java:72 There is no such parameter now. src/main/java/org/apache/hadoop/hbase/util/ChecksumFactory.java:55 Would newConstructor be better name ? This method doesn't really create a new instance. src/main/java/org/apache/hadoop/hbase/util/HFileSystem.java:36 Should read 'An encapsulation' src/main/java/org/apache/hadoop/hbase/util/HFileSystem.java:61 Using this.fs would be cleaner. src/main/java/org/apache/hadoop/hbase/util/HFileSystem.java:122 Can we make the method name and field name consistent in terms of plurality ? REVISION DETAIL https://reviews.facebook.net/D1521
          Hide
          Phabricator added a comment -

          dhruba updated the revision "[jira] HBASE-5074 Support checksums in HBase block cache".
          Reviewers: mbautin

          Incorporated review comments from Ted.

          REVISION DETAIL
          https://reviews.facebook.net/D1521

          AFFECTED FILES
          src/test/java/org/apache/hadoop/hbase/HBaseTestingUtility.java
          src/test/java/org/apache/hadoop/hbase/regionserver/TestSplitTransaction.java
          src/test/java/org/apache/hadoop/hbase/regionserver/CreateRandomStoreFile.java
          src/test/java/org/apache/hadoop/hbase/regionserver/handler/TestCloseRegionHandler.java
          src/test/java/org/apache/hadoop/hbase/regionserver/TestFSErrorsExposed.java
          src/test/java/org/apache/hadoop/hbase/regionserver/HFileReadWriteTest.java
          src/test/java/org/apache/hadoop/hbase/regionserver/TestCompoundBloomFilter.java
          src/test/java/org/apache/hadoop/hbase/regionserver/TestHRegion.java
          src/test/java/org/apache/hadoop/hbase/regionserver/TestStoreFile.java
          src/test/java/org/apache/hadoop/hbase/io/hfile/TestHFileBlock.java
          src/test/java/org/apache/hadoop/hbase/io/hfile/TestChecksum.java
          src/test/java/org/apache/hadoop/hbase/io/hfile/TestCacheOnWrite.java
          src/test/java/org/apache/hadoop/hbase/io/hfile/TestHFileReaderV1.java
          src/test/java/org/apache/hadoop/hbase/io/hfile/TestFixedFileTrailer.java
          src/test/java/org/apache/hadoop/hbase/io/hfile/CacheTestUtils.java
          src/test/java/org/apache/hadoop/hbase/io/hfile/TestHFileBlockIndex.java
          src/test/java/org/apache/hadoop/hbase/io/hfile/TestHFileWriterV2.java
          src/test/java/org/apache/hadoop/hbase/io/hfile/TestHFileDataBlockEncoder.java
          src/test/java/org/apache/hadoop/hbase/io/hfile/TestHFileBlockCompatibility.java
          src/test/java/org/apache/hadoop/hbase/util/MockRegionServerServices.java
          src/main/java/org/apache/hadoop/hbase/HConstants.java
          src/main/java/org/apache/hadoop/hbase/util/ChecksumType.java
          src/main/java/org/apache/hadoop/hbase/util/HFileSystem.java
          src/main/java/org/apache/hadoop/hbase/util/ChecksumByteArrayOutputStream.java
          src/main/java/org/apache/hadoop/hbase/util/CompoundBloomFilter.java
          src/main/java/org/apache/hadoop/hbase/util/ChecksumFactory.java
          src/main/java/org/apache/hadoop/hbase/regionserver/handler/OpenRegionHandler.java
          src/main/java/org/apache/hadoop/hbase/regionserver/RegionServerServices.java
          src/main/java/org/apache/hadoop/hbase/regionserver/metrics/RegionServerMetrics.java
          src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java
          src/main/java/org/apache/hadoop/hbase/regionserver/StoreFile.java
          src/main/java/org/apache/hadoop/hbase/regionserver/Store.java
          src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServer.java
          src/main/java/org/apache/hadoop/hbase/mapreduce/LoadIncrementalHFiles.java
          src/main/java/org/apache/hadoop/hbase/io/hfile/ChecksumUtil.java
          src/main/java/org/apache/hadoop/hbase/io/hfile/HFileBlock.java
          src/main/java/org/apache/hadoop/hbase/io/hfile/HFileDataBlockEncoderImpl.java
          src/main/java/org/apache/hadoop/hbase/io/hfile/HFileReaderV2.java
          src/main/java/org/apache/hadoop/hbase/io/hfile/HFile.java
          src/main/java/org/apache/hadoop/hbase/io/hfile/HFileDataBlockEncoder.java
          src/main/java/org/apache/hadoop/hbase/io/hfile/HFileWriterV2.java
          src/main/java/org/apache/hadoop/hbase/io/hfile/AbstractHFileReader.java
          src/main/java/org/apache/hadoop/hbase/io/hfile/NoOpDataBlockEncoder.java
          src/main/java/org/apache/hadoop/hbase/io/hfile/HFileReaderV1.java
          src/main/java/org/apache/hadoop/hbase/io/hfile/FixedFileTrailer.java
          src/main/java/org/apache/hadoop/hbase/io/hfile/HFileWriterV1.java

          Show
          Phabricator added a comment - dhruba updated the revision " [jira] HBASE-5074 Support checksums in HBase block cache". Reviewers: mbautin Incorporated review comments from Ted. REVISION DETAIL https://reviews.facebook.net/D1521 AFFECTED FILES src/test/java/org/apache/hadoop/hbase/HBaseTestingUtility.java src/test/java/org/apache/hadoop/hbase/regionserver/TestSplitTransaction.java src/test/java/org/apache/hadoop/hbase/regionserver/CreateRandomStoreFile.java src/test/java/org/apache/hadoop/hbase/regionserver/handler/TestCloseRegionHandler.java src/test/java/org/apache/hadoop/hbase/regionserver/TestFSErrorsExposed.java src/test/java/org/apache/hadoop/hbase/regionserver/HFileReadWriteTest.java src/test/java/org/apache/hadoop/hbase/regionserver/TestCompoundBloomFilter.java src/test/java/org/apache/hadoop/hbase/regionserver/TestHRegion.java src/test/java/org/apache/hadoop/hbase/regionserver/TestStoreFile.java src/test/java/org/apache/hadoop/hbase/io/hfile/TestHFileBlock.java src/test/java/org/apache/hadoop/hbase/io/hfile/TestChecksum.java src/test/java/org/apache/hadoop/hbase/io/hfile/TestCacheOnWrite.java src/test/java/org/apache/hadoop/hbase/io/hfile/TestHFileReaderV1.java src/test/java/org/apache/hadoop/hbase/io/hfile/TestFixedFileTrailer.java src/test/java/org/apache/hadoop/hbase/io/hfile/CacheTestUtils.java src/test/java/org/apache/hadoop/hbase/io/hfile/TestHFileBlockIndex.java src/test/java/org/apache/hadoop/hbase/io/hfile/TestHFileWriterV2.java src/test/java/org/apache/hadoop/hbase/io/hfile/TestHFileDataBlockEncoder.java src/test/java/org/apache/hadoop/hbase/io/hfile/TestHFileBlockCompatibility.java src/test/java/org/apache/hadoop/hbase/util/MockRegionServerServices.java src/main/java/org/apache/hadoop/hbase/HConstants.java src/main/java/org/apache/hadoop/hbase/util/ChecksumType.java src/main/java/org/apache/hadoop/hbase/util/HFileSystem.java src/main/java/org/apache/hadoop/hbase/util/ChecksumByteArrayOutputStream.java src/main/java/org/apache/hadoop/hbase/util/CompoundBloomFilter.java src/main/java/org/apache/hadoop/hbase/util/ChecksumFactory.java src/main/java/org/apache/hadoop/hbase/regionserver/handler/OpenRegionHandler.java src/main/java/org/apache/hadoop/hbase/regionserver/RegionServerServices.java src/main/java/org/apache/hadoop/hbase/regionserver/metrics/RegionServerMetrics.java src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java src/main/java/org/apache/hadoop/hbase/regionserver/StoreFile.java src/main/java/org/apache/hadoop/hbase/regionserver/Store.java src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServer.java src/main/java/org/apache/hadoop/hbase/mapreduce/LoadIncrementalHFiles.java src/main/java/org/apache/hadoop/hbase/io/hfile/ChecksumUtil.java src/main/java/org/apache/hadoop/hbase/io/hfile/HFileBlock.java src/main/java/org/apache/hadoop/hbase/io/hfile/HFileDataBlockEncoderImpl.java src/main/java/org/apache/hadoop/hbase/io/hfile/HFileReaderV2.java src/main/java/org/apache/hadoop/hbase/io/hfile/HFile.java src/main/java/org/apache/hadoop/hbase/io/hfile/HFileDataBlockEncoder.java src/main/java/org/apache/hadoop/hbase/io/hfile/HFileWriterV2.java src/main/java/org/apache/hadoop/hbase/io/hfile/AbstractHFileReader.java src/main/java/org/apache/hadoop/hbase/io/hfile/NoOpDataBlockEncoder.java src/main/java/org/apache/hadoop/hbase/io/hfile/HFileReaderV1.java src/main/java/org/apache/hadoop/hbase/io/hfile/FixedFileTrailer.java src/main/java/org/apache/hadoop/hbase/io/hfile/HFileWriterV1.java
          Hide
          Phabricator added a comment -

          dhruba updated the revision "[jira] HBASE-5074 Support checksums in HBase block cache".
          Reviewers: mbautin

          Incorporated review comments from Ted.

          REVISION DETAIL
          https://reviews.facebook.net/D1521

          AFFECTED FILES
          src/test/java/org/apache/hadoop/hbase/HBaseTestingUtility.java
          src/test/java/org/apache/hadoop/hbase/regionserver/TestSplitTransaction.java
          src/test/java/org/apache/hadoop/hbase/regionserver/CreateRandomStoreFile.java
          src/test/java/org/apache/hadoop/hbase/regionserver/handler/TestCloseRegionHandler.java
          src/test/java/org/apache/hadoop/hbase/regionserver/TestFSErrorsExposed.java
          src/test/java/org/apache/hadoop/hbase/regionserver/HFileReadWriteTest.java
          src/test/java/org/apache/hadoop/hbase/regionserver/TestCompoundBloomFilter.java
          src/test/java/org/apache/hadoop/hbase/regionserver/TestHRegion.java
          src/test/java/org/apache/hadoop/hbase/regionserver/TestStoreFile.java
          src/test/java/org/apache/hadoop/hbase/io/hfile/TestHFileBlock.java
          src/test/java/org/apache/hadoop/hbase/io/hfile/TestChecksum.java
          src/test/java/org/apache/hadoop/hbase/io/hfile/TestCacheOnWrite.java
          src/test/java/org/apache/hadoop/hbase/io/hfile/TestHFileReaderV1.java
          src/test/java/org/apache/hadoop/hbase/io/hfile/TestFixedFileTrailer.java
          src/test/java/org/apache/hadoop/hbase/io/hfile/CacheTestUtils.java
          src/test/java/org/apache/hadoop/hbase/io/hfile/TestHFileBlockIndex.java
          src/test/java/org/apache/hadoop/hbase/io/hfile/TestHFileWriterV2.java
          src/test/java/org/apache/hadoop/hbase/io/hfile/TestHFileDataBlockEncoder.java
          src/test/java/org/apache/hadoop/hbase/io/hfile/TestHFileBlockCompatibility.java
          src/test/java/org/apache/hadoop/hbase/util/MockRegionServerServices.java
          src/main/java/org/apache/hadoop/hbase/HConstants.java
          src/main/java/org/apache/hadoop/hbase/util/ChecksumType.java
          src/main/java/org/apache/hadoop/hbase/util/HFileSystem.java
          src/main/java/org/apache/hadoop/hbase/util/ChecksumByteArrayOutputStream.java
          src/main/java/org/apache/hadoop/hbase/util/CompoundBloomFilter.java
          src/main/java/org/apache/hadoop/hbase/util/ChecksumFactory.java
          src/main/java/org/apache/hadoop/hbase/regionserver/handler/OpenRegionHandler.java
          src/main/java/org/apache/hadoop/hbase/regionserver/RegionServerServices.java
          src/main/java/org/apache/hadoop/hbase/regionserver/metrics/RegionServerMetrics.java
          src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java
          src/main/java/org/apache/hadoop/hbase/regionserver/StoreFile.java
          src/main/java/org/apache/hadoop/hbase/regionserver/Store.java
          src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServer.java
          src/main/java/org/apache/hadoop/hbase/mapreduce/LoadIncrementalHFiles.java
          src/main/java/org/apache/hadoop/hbase/io/hfile/ChecksumUtil.java
          src/main/java/org/apache/hadoop/hbase/io/hfile/HFileBlock.java
          src/main/java/org/apache/hadoop/hbase/io/hfile/HFileDataBlockEncoderImpl.java
          src/main/java/org/apache/hadoop/hbase/io/hfile/HFileReaderV2.java
          src/main/java/org/apache/hadoop/hbase/io/hfile/HFile.java
          src/main/java/org/apache/hadoop/hbase/io/hfile/HFileDataBlockEncoder.java
          src/main/java/org/apache/hadoop/hbase/io/hfile/HFileWriterV2.java
          src/main/java/org/apache/hadoop/hbase/io/hfile/AbstractHFileReader.java
          src/main/java/org/apache/hadoop/hbase/io/hfile/NoOpDataBlockEncoder.java
          src/main/java/org/apache/hadoop/hbase/io/hfile/HFileReaderV1.java
          src/main/java/org/apache/hadoop/hbase/io/hfile/FixedFileTrailer.java
          src/main/java/org/apache/hadoop/hbase/io/hfile/HFileWriterV1.java

          Show
          Phabricator added a comment - dhruba updated the revision " [jira] HBASE-5074 Support checksums in HBase block cache". Reviewers: mbautin Incorporated review comments from Ted. REVISION DETAIL https://reviews.facebook.net/D1521 AFFECTED FILES src/test/java/org/apache/hadoop/hbase/HBaseTestingUtility.java src/test/java/org/apache/hadoop/hbase/regionserver/TestSplitTransaction.java src/test/java/org/apache/hadoop/hbase/regionserver/CreateRandomStoreFile.java src/test/java/org/apache/hadoop/hbase/regionserver/handler/TestCloseRegionHandler.java src/test/java/org/apache/hadoop/hbase/regionserver/TestFSErrorsExposed.java src/test/java/org/apache/hadoop/hbase/regionserver/HFileReadWriteTest.java src/test/java/org/apache/hadoop/hbase/regionserver/TestCompoundBloomFilter.java src/test/java/org/apache/hadoop/hbase/regionserver/TestHRegion.java src/test/java/org/apache/hadoop/hbase/regionserver/TestStoreFile.java src/test/java/org/apache/hadoop/hbase/io/hfile/TestHFileBlock.java src/test/java/org/apache/hadoop/hbase/io/hfile/TestChecksum.java src/test/java/org/apache/hadoop/hbase/io/hfile/TestCacheOnWrite.java src/test/java/org/apache/hadoop/hbase/io/hfile/TestHFileReaderV1.java src/test/java/org/apache/hadoop/hbase/io/hfile/TestFixedFileTrailer.java src/test/java/org/apache/hadoop/hbase/io/hfile/CacheTestUtils.java src/test/java/org/apache/hadoop/hbase/io/hfile/TestHFileBlockIndex.java src/test/java/org/apache/hadoop/hbase/io/hfile/TestHFileWriterV2.java src/test/java/org/apache/hadoop/hbase/io/hfile/TestHFileDataBlockEncoder.java src/test/java/org/apache/hadoop/hbase/io/hfile/TestHFileBlockCompatibility.java src/test/java/org/apache/hadoop/hbase/util/MockRegionServerServices.java src/main/java/org/apache/hadoop/hbase/HConstants.java src/main/java/org/apache/hadoop/hbase/util/ChecksumType.java src/main/java/org/apache/hadoop/hbase/util/HFileSystem.java src/main/java/org/apache/hadoop/hbase/util/ChecksumByteArrayOutputStream.java src/main/java/org/apache/hadoop/hbase/util/CompoundBloomFilter.java src/main/java/org/apache/hadoop/hbase/util/ChecksumFactory.java src/main/java/org/apache/hadoop/hbase/regionserver/handler/OpenRegionHandler.java src/main/java/org/apache/hadoop/hbase/regionserver/RegionServerServices.java src/main/java/org/apache/hadoop/hbase/regionserver/metrics/RegionServerMetrics.java src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java src/main/java/org/apache/hadoop/hbase/regionserver/StoreFile.java src/main/java/org/apache/hadoop/hbase/regionserver/Store.java src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServer.java src/main/java/org/apache/hadoop/hbase/mapreduce/LoadIncrementalHFiles.java src/main/java/org/apache/hadoop/hbase/io/hfile/ChecksumUtil.java src/main/java/org/apache/hadoop/hbase/io/hfile/HFileBlock.java src/main/java/org/apache/hadoop/hbase/io/hfile/HFileDataBlockEncoderImpl.java src/main/java/org/apache/hadoop/hbase/io/hfile/HFileReaderV2.java src/main/java/org/apache/hadoop/hbase/io/hfile/HFile.java src/main/java/org/apache/hadoop/hbase/io/hfile/HFileDataBlockEncoder.java src/main/java/org/apache/hadoop/hbase/io/hfile/HFileWriterV2.java src/main/java/org/apache/hadoop/hbase/io/hfile/AbstractHFileReader.java src/main/java/org/apache/hadoop/hbase/io/hfile/NoOpDataBlockEncoder.java src/main/java/org/apache/hadoop/hbase/io/hfile/HFileReaderV1.java src/main/java/org/apache/hadoop/hbase/io/hfile/FixedFileTrailer.java src/main/java/org/apache/hadoop/hbase/io/hfile/HFileWriterV1.java
          Hide
          Phabricator added a comment -

          dhruba has commented on the revision "[jira] HBASE-5074 Support checksums in HBase block cache".

          Todd: can you pl re-review this one more time (at least to ensure that your earlier concerns are addressed).

          REVISION DETAIL
          https://reviews.facebook.net/D1521

          Show
          Phabricator added a comment - dhruba has commented on the revision " [jira] HBASE-5074 Support checksums in HBase block cache". Todd: can you pl re-review this one more time (at least to ensure that your earlier concerns are addressed). REVISION DETAIL https://reviews.facebook.net/D1521
          Hide
          Phabricator added a comment -

          dhruba has commented on the revision "[jira] HBASE-5074 Support checksums in HBase block cache".

          Todd: can you pl re-review this one more time (at least to ensure that your earlier concerns are addressed).

          REVISION DETAIL
          https://reviews.facebook.net/D1521

          Show
          Phabricator added a comment - dhruba has commented on the revision " [jira] HBASE-5074 Support checksums in HBase block cache". Todd: can you pl re-review this one more time (at least to ensure that your earlier concerns are addressed). REVISION DETAIL https://reviews.facebook.net/D1521
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12513780/D1521.5.patch
          against trunk revision .

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 58 new or modified tests.

          -1 javadoc. The javadoc tool appears to have generated -132 warning messages.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          -1 findbugs. The patch appears to introduce 160 new Findbugs (version 1.3.9) warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          -1 core tests. The patch failed these unit tests:
          org.apache.hadoop.hbase.io.hfile.TestHFileBlock

          Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/923//testReport/
          Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/923//artifact/trunk/patchprocess/newPatchFindbugsWarnings.html
          Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/923//console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12513780/D1521.5.patch against trunk revision . +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 58 new or modified tests. -1 javadoc. The javadoc tool appears to have generated -132 warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. -1 findbugs. The patch appears to introduce 160 new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed these unit tests: org.apache.hadoop.hbase.io.hfile.TestHFileBlock Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/923//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/923//artifact/trunk/patchprocess/newPatchFindbugsWarnings.html Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/923//console This message is automatically generated.
          Hide
          Phabricator added a comment -

          todd has commented on the revision "[jira] HBASE-5074 Support checksums in HBase block cache".

          INLINE COMMENTS
          src/main/java/org/apache/hadoop/hbase/HConstants.java:598 typo: verification

          and still not sure what true/false means here... would be better to clarify either here or in src/main/resources/hbase-default.xml if you anticipate users ever changing this.

          If I set it to false does that mean I get no checksumming? or hdfs checksumming as before? please update the comment

          src/main/java/org/apache/hadoop/hbase/io/hfile/ChecksumUtil.java:41-43 I think this API would be cleaner with the following changes:

          • rather than use the constant HFileBlock.HEADER_SIZE below, make the API:

          appendChecksums(ChecksumByteArrayOutputStream baos,
          int dataOffset, int dataLen,
          ChecksumType checksumType,
          int bytesPerChecksum) {
          }

          where it would checksum the data between dataOffset and dataOffset + dataLen, and append it to the baos
          src/main/java/org/apache/hadoop/hbase/io/hfile/ChecksumUtil.java:73 same here, I think it's better to take the offset as a parameter instead of assume HFileBlock.HEADER_SIZE
          src/main/java/org/apache/hadoop/hbase/io/hfile/ChecksumUtil.java:84 if this is performance critical, use DataOutputBuffer, presized to right size, and then return its underlying buffer directly to avoid a copy and realloc
          src/main/java/org/apache/hadoop/hbase/io/hfile/ChecksumUtil.java:123 seems strange that this is inconsistent with the above – if the block desn't have a checksum, why is that differently handled than if the block is from a prior version which doesn't have a checksum?
          src/main/java/org/apache/hadoop/hbase/io/hfile/HFileBlock.java:100 typo re-enable
          src/main/java/org/apache/hadoop/hbase/io/hfile/HFileBlock.java:79-80 should clarify which part of the data is checksummed.
          As I read the code, only the non-header data (ie the "user data") is checksummed. Is this correct?
          It seems to me like this is potentially dangerous – eg a flipped bit in an hfile block header might cause the "compressedDataSize" field to be read as 2GB or something, in which case the faulty allocation could cause the server to OOME. I think we need a checksum on the hfile block header as well.
          src/main/java/org/apache/hadoop/hbase/io/hfile/HFileBlock.java:824 rename to doCompressionAndChecksumming, and update javadoc
          src/main/java/org/apache/hadoop/hbase/io/hfile/HFileBlock.java:815 I was a bit confused by this at first - I think it would be nice to add a comment here saying:
          // set the header for the uncompressed bytes (for cache-on-write)

          src/main/java/org/apache/hadoop/hbase/io/hfile/HFileBlock.java:852 this weird difference between compressed and uncompressed case could be improved, I think:
          Why not make the uncompressedBytesWithHeader leave free space for the checksums at the end of the array, and have it generate the checksums into that space?
          Or change generateChecksums to take another array as an argument, rather than having it append to the same 'baos'?

          It's currently quite confusing that "onDiskChecksum" ends up empty in the compressed case, even though we did write a checksum lumped in with the onDiskBytesWithHeader.

          src/main/java/org/apache/hadoop/hbase/io/hfile/HFileBlock.java:1375-1379 Similar to above comment about the block headers, I think we need to do our own checksumming on the hfile metadata itself – what about a corruption in the file header? Alternatively we could always use the checksummed stream when loading the file-wide header which is probably much simpler
          src/main/java/org/apache/hadoop/hbase/io/hfile/HFileBlock.java:1545 confused by this - if we dn't have an HFileSystem, then wouldn't we assume that the checksumming is done by the underlying dfs, and not use hbase checksums?
          src/main/java/org/apache/hadoop/hbase/io/hfile/HFileBlock.java:1580 s/it never changes/because it is marked final/
          src/main/java/org/apache/hadoop/hbase/io/hfile/HFileBlock.java:1588-1590 this isn't thread-safe: multiple threads might decrement and skip -1, causing it to never get re-enabled.
          src/main/java/org/apache/hadoop/hbase/io/hfile/HFileBlock.java:1599 add comment here // checksum verification failed
          src/main/java/org/apache/hadoop/hbase/io/hfile/HFileBlock.java:1620-1623 msg should include file path
          src/main/java/org/apache/hadoop/hbase/util/HFileSystem.java:53 typo: delegate
          src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java:3620 Given we have rsServices.getFileSystem, why do we need to also pass this in?

          REVISION DETAIL
          https://reviews.facebook.net/D1521

          Show
          Phabricator added a comment - todd has commented on the revision " [jira] HBASE-5074 Support checksums in HBase block cache". INLINE COMMENTS src/main/java/org/apache/hadoop/hbase/HConstants.java:598 typo: verification and still not sure what true/false means here... would be better to clarify either here or in src/main/resources/hbase-default.xml if you anticipate users ever changing this. If I set it to false does that mean I get no checksumming? or hdfs checksumming as before? please update the comment src/main/java/org/apache/hadoop/hbase/io/hfile/ChecksumUtil.java:41-43 I think this API would be cleaner with the following changes: rather than use the constant HFileBlock.HEADER_SIZE below, make the API: appendChecksums(ChecksumByteArrayOutputStream baos, int dataOffset, int dataLen, ChecksumType checksumType, int bytesPerChecksum) { } where it would checksum the data between dataOffset and dataOffset + dataLen, and append it to the baos src/main/java/org/apache/hadoop/hbase/io/hfile/ChecksumUtil.java:73 same here, I think it's better to take the offset as a parameter instead of assume HFileBlock.HEADER_SIZE src/main/java/org/apache/hadoop/hbase/io/hfile/ChecksumUtil.java:84 if this is performance critical, use DataOutputBuffer, presized to right size, and then return its underlying buffer directly to avoid a copy and realloc src/main/java/org/apache/hadoop/hbase/io/hfile/ChecksumUtil.java:123 seems strange that this is inconsistent with the above – if the block desn't have a checksum, why is that differently handled than if the block is from a prior version which doesn't have a checksum? src/main/java/org/apache/hadoop/hbase/io/hfile/HFileBlock.java:100 typo re-enable src/main/java/org/apache/hadoop/hbase/io/hfile/HFileBlock.java:79-80 should clarify which part of the data is checksummed. As I read the code, only the non-header data (ie the "user data") is checksummed. Is this correct? It seems to me like this is potentially dangerous – eg a flipped bit in an hfile block header might cause the "compressedDataSize" field to be read as 2GB or something, in which case the faulty allocation could cause the server to OOME. I think we need a checksum on the hfile block header as well. src/main/java/org/apache/hadoop/hbase/io/hfile/HFileBlock.java:824 rename to doCompressionAndChecksumming, and update javadoc src/main/java/org/apache/hadoop/hbase/io/hfile/HFileBlock.java:815 I was a bit confused by this at first - I think it would be nice to add a comment here saying: // set the header for the uncompressed bytes (for cache-on-write) src/main/java/org/apache/hadoop/hbase/io/hfile/HFileBlock.java:852 this weird difference between compressed and uncompressed case could be improved, I think: Why not make the uncompressedBytesWithHeader leave free space for the checksums at the end of the array, and have it generate the checksums into that space? Or change generateChecksums to take another array as an argument, rather than having it append to the same 'baos'? It's currently quite confusing that "onDiskChecksum" ends up empty in the compressed case, even though we did write a checksum lumped in with the onDiskBytesWithHeader. src/main/java/org/apache/hadoop/hbase/io/hfile/HFileBlock.java:1375-1379 Similar to above comment about the block headers, I think we need to do our own checksumming on the hfile metadata itself – what about a corruption in the file header? Alternatively we could always use the checksummed stream when loading the file-wide header which is probably much simpler src/main/java/org/apache/hadoop/hbase/io/hfile/HFileBlock.java:1545 confused by this - if we dn't have an HFileSystem, then wouldn't we assume that the checksumming is done by the underlying dfs, and not use hbase checksums? src/main/java/org/apache/hadoop/hbase/io/hfile/HFileBlock.java:1580 s/it never changes/because it is marked final/ src/main/java/org/apache/hadoop/hbase/io/hfile/HFileBlock.java:1588-1590 this isn't thread-safe: multiple threads might decrement and skip -1, causing it to never get re-enabled. src/main/java/org/apache/hadoop/hbase/io/hfile/HFileBlock.java:1599 add comment here // checksum verification failed src/main/java/org/apache/hadoop/hbase/io/hfile/HFileBlock.java:1620-1623 msg should include file path src/main/java/org/apache/hadoop/hbase/util/HFileSystem.java:53 typo: delegate src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java:3620 Given we have rsServices.getFileSystem, why do we need to also pass this in? REVISION DETAIL https://reviews.facebook.net/D1521
          Hide
          Phabricator added a comment -

          todd has commented on the revision "[jira] HBASE-5074 Support checksums in HBase block cache".

          INLINE COMMENTS
          src/main/java/org/apache/hadoop/hbase/HConstants.java:598 typo: verification

          and still not sure what true/false means here... would be better to clarify either here or in src/main/resources/hbase-default.xml if you anticipate users ever changing this.

          If I set it to false does that mean I get no checksumming? or hdfs checksumming as before? please update the comment

          src/main/java/org/apache/hadoop/hbase/io/hfile/ChecksumUtil.java:41-43 I think this API would be cleaner with the following changes:

          • rather than use the constant HFileBlock.HEADER_SIZE below, make the API:

          appendChecksums(ChecksumByteArrayOutputStream baos,
          int dataOffset, int dataLen,
          ChecksumType checksumType,
          int bytesPerChecksum) {
          }

          where it would checksum the data between dataOffset and dataOffset + dataLen, and append it to the baos
          src/main/java/org/apache/hadoop/hbase/io/hfile/ChecksumUtil.java:73 same here, I think it's better to take the offset as a parameter instead of assume HFileBlock.HEADER_SIZE
          src/main/java/org/apache/hadoop/hbase/io/hfile/ChecksumUtil.java:84 if this is performance critical, use DataOutputBuffer, presized to right size, and then return its underlying buffer directly to avoid a copy and realloc
          src/main/java/org/apache/hadoop/hbase/io/hfile/ChecksumUtil.java:123 seems strange that this is inconsistent with the above – if the block desn't have a checksum, why is that differently handled than if the block is from a prior version which doesn't have a checksum?
          src/main/java/org/apache/hadoop/hbase/io/hfile/HFileBlock.java:100 typo re-enable
          src/main/java/org/apache/hadoop/hbase/io/hfile/HFileBlock.java:79-80 should clarify which part of the data is checksummed.
          As I read the code, only the non-header data (ie the "user data") is checksummed. Is this correct?
          It seems to me like this is potentially dangerous – eg a flipped bit in an hfile block header might cause the "compressedDataSize" field to be read as 2GB or something, in which case the faulty allocation could cause the server to OOME. I think we need a checksum on the hfile block header as well.
          src/main/java/org/apache/hadoop/hbase/io/hfile/HFileBlock.java:824 rename to doCompressionAndChecksumming, and update javadoc
          src/main/java/org/apache/hadoop/hbase/io/hfile/HFileBlock.java:815 I was a bit confused by this at first - I think it would be nice to add a comment here saying:
          // set the header for the uncompressed bytes (for cache-on-write)

          src/main/java/org/apache/hadoop/hbase/io/hfile/HFileBlock.java:852 this weird difference between compressed and uncompressed case could be improved, I think:
          Why not make the uncompressedBytesWithHeader leave free space for the checksums at the end of the array, and have it generate the checksums into that space?
          Or change generateChecksums to take another array as an argument, rather than having it append to the same 'baos'?

          It's currently quite confusing that "onDiskChecksum" ends up empty in the compressed case, even though we did write a checksum lumped in with the onDiskBytesWithHeader.

          src/main/java/org/apache/hadoop/hbase/io/hfile/HFileBlock.java:1375-1379 Similar to above comment about the block headers, I think we need to do our own checksumming on the hfile metadata itself – what about a corruption in the file header? Alternatively we could always use the checksummed stream when loading the file-wide header which is probably much simpler
          src/main/java/org/apache/hadoop/hbase/io/hfile/HFileBlock.java:1545 confused by this - if we dn't have an HFileSystem, then wouldn't we assume that the checksumming is done by the underlying dfs, and not use hbase checksums?
          src/main/java/org/apache/hadoop/hbase/io/hfile/HFileBlock.java:1580 s/it never changes/because it is marked final/
          src/main/java/org/apache/hadoop/hbase/io/hfile/HFileBlock.java:1588-1590 this isn't thread-safe: multiple threads might decrement and skip -1, causing it to never get re-enabled.
          src/main/java/org/apache/hadoop/hbase/io/hfile/HFileBlock.java:1599 add comment here // checksum verification failed
          src/main/java/org/apache/hadoop/hbase/io/hfile/HFileBlock.java:1620-1623 msg should include file path
          src/main/java/org/apache/hadoop/hbase/util/HFileSystem.java:53 typo: delegate
          src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java:3620 Given we have rsServices.getFileSystem, why do we need to also pass this in?

          REVISION DETAIL
          https://reviews.facebook.net/D1521

          Show
          Phabricator added a comment - todd has commented on the revision " [jira] HBASE-5074 Support checksums in HBase block cache". INLINE COMMENTS src/main/java/org/apache/hadoop/hbase/HConstants.java:598 typo: verification and still not sure what true/false means here... would be better to clarify either here or in src/main/resources/hbase-default.xml if you anticipate users ever changing this. If I set it to false does that mean I get no checksumming? or hdfs checksumming as before? please update the comment src/main/java/org/apache/hadoop/hbase/io/hfile/ChecksumUtil.java:41-43 I think this API would be cleaner with the following changes: rather than use the constant HFileBlock.HEADER_SIZE below, make the API: appendChecksums(ChecksumByteArrayOutputStream baos, int dataOffset, int dataLen, ChecksumType checksumType, int bytesPerChecksum) { } where it would checksum the data between dataOffset and dataOffset + dataLen, and append it to the baos src/main/java/org/apache/hadoop/hbase/io/hfile/ChecksumUtil.java:73 same here, I think it's better to take the offset as a parameter instead of assume HFileBlock.HEADER_SIZE src/main/java/org/apache/hadoop/hbase/io/hfile/ChecksumUtil.java:84 if this is performance critical, use DataOutputBuffer, presized to right size, and then return its underlying buffer directly to avoid a copy and realloc src/main/java/org/apache/hadoop/hbase/io/hfile/ChecksumUtil.java:123 seems strange that this is inconsistent with the above – if the block desn't have a checksum, why is that differently handled than if the block is from a prior version which doesn't have a checksum? src/main/java/org/apache/hadoop/hbase/io/hfile/HFileBlock.java:100 typo re-enable src/main/java/org/apache/hadoop/hbase/io/hfile/HFileBlock.java:79-80 should clarify which part of the data is checksummed. As I read the code, only the non-header data (ie the "user data") is checksummed. Is this correct? It seems to me like this is potentially dangerous – eg a flipped bit in an hfile block header might cause the "compressedDataSize" field to be read as 2GB or something, in which case the faulty allocation could cause the server to OOME. I think we need a checksum on the hfile block header as well. src/main/java/org/apache/hadoop/hbase/io/hfile/HFileBlock.java:824 rename to doCompressionAndChecksumming, and update javadoc src/main/java/org/apache/hadoop/hbase/io/hfile/HFileBlock.java:815 I was a bit confused by this at first - I think it would be nice to add a comment here saying: // set the header for the uncompressed bytes (for cache-on-write) src/main/java/org/apache/hadoop/hbase/io/hfile/HFileBlock.java:852 this weird difference between compressed and uncompressed case could be improved, I think: Why not make the uncompressedBytesWithHeader leave free space for the checksums at the end of the array, and have it generate the checksums into that space? Or change generateChecksums to take another array as an argument, rather than having it append to the same 'baos'? It's currently quite confusing that "onDiskChecksum" ends up empty in the compressed case, even though we did write a checksum lumped in with the onDiskBytesWithHeader. src/main/java/org/apache/hadoop/hbase/io/hfile/HFileBlock.java:1375-1379 Similar to above comment about the block headers, I think we need to do our own checksumming on the hfile metadata itself – what about a corruption in the file header? Alternatively we could always use the checksummed stream when loading the file-wide header which is probably much simpler src/main/java/org/apache/hadoop/hbase/io/hfile/HFileBlock.java:1545 confused by this - if we dn't have an HFileSystem, then wouldn't we assume that the checksumming is done by the underlying dfs, and not use hbase checksums? src/main/java/org/apache/hadoop/hbase/io/hfile/HFileBlock.java:1580 s/it never changes/because it is marked final/ src/main/java/org/apache/hadoop/hbase/io/hfile/HFileBlock.java:1588-1590 this isn't thread-safe: multiple threads might decrement and skip -1, causing it to never get re-enabled. src/main/java/org/apache/hadoop/hbase/io/hfile/HFileBlock.java:1599 add comment here // checksum verification failed src/main/java/org/apache/hadoop/hbase/io/hfile/HFileBlock.java:1620-1623 msg should include file path src/main/java/org/apache/hadoop/hbase/util/HFileSystem.java:53 typo: delegate src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java:3620 Given we have rsServices.getFileSystem, why do we need to also pass this in? REVISION DETAIL https://reviews.facebook.net/D1521
          Hide
          Phabricator added a comment -

          tedyu has commented on the revision "[jira] HBASE-5074 Support checksums in HBase block cache".

          INLINE COMMENTS
          src/main/java/org/apache/hadoop/hbase/io/hfile/HFileBlock.java:1588-1590 It would be nice to make this part of logic (re-enabling HBase checksumming) pluggable.
          Can be done in a follow-on JIRA.
          src/main/java/org/apache/hadoop/hbase/io/hfile/HFileBlock.java:1600 Assertion may be disabled in production.

          REVISION DETAIL
          https://reviews.facebook.net/D1521

          Show
          Phabricator added a comment - tedyu has commented on the revision " [jira] HBASE-5074 Support checksums in HBase block cache". INLINE COMMENTS src/main/java/org/apache/hadoop/hbase/io/hfile/HFileBlock.java:1588-1590 It would be nice to make this part of logic (re-enabling HBase checksumming) pluggable. Can be done in a follow-on JIRA. src/main/java/org/apache/hadoop/hbase/io/hfile/HFileBlock.java:1600 Assertion may be disabled in production. REVISION DETAIL https://reviews.facebook.net/D1521
          Hide
          Phabricator added a comment -

          tedyu has commented on the revision "[jira] HBASE-5074 Support checksums in HBase block cache".

          INLINE COMMENTS
          src/main/java/org/apache/hadoop/hbase/io/hfile/HFileBlock.java:1588-1590 It would be nice to make this part of logic (re-enabling HBase checksumming) pluggable.
          Can be done in a follow-on JIRA.
          src/main/java/org/apache/hadoop/hbase/io/hfile/HFileBlock.java:1600 Assertion may be disabled in production.

          REVISION DETAIL
          https://reviews.facebook.net/D1521

          Show
          Phabricator added a comment - tedyu has commented on the revision " [jira] HBASE-5074 Support checksums in HBase block cache". INLINE COMMENTS src/main/java/org/apache/hadoop/hbase/io/hfile/HFileBlock.java:1588-1590 It would be nice to make this part of logic (re-enabling HBase checksumming) pluggable. Can be done in a follow-on JIRA. src/main/java/org/apache/hadoop/hbase/io/hfile/HFileBlock.java:1600 Assertion may be disabled in production. REVISION DETAIL https://reviews.facebook.net/D1521
          Hide
          Phabricator added a comment -

          stack has commented on the revision "[jira] HBASE-5074 Support checksums in HBase block cache".

          Took a look at a little piece of the patch. It looks great.

          INLINE COMMENTS
          src/main/java/org/apache/hadoop/hbase/HConstants.java:601 It looks like this feature will be on by default. Good.
          src/main/java/org/apache/hadoop/hbase/util/HFileSystem.java:1 Should this class be in an fs package rather than in util?

          Nit. HFileSystem seems overly generic. Should it be HBaseFileSystem?

          Out of interest, is there a performance penalty that you know of going via FilterFileSystem?
          src/main/java/org/apache/hadoop/hbase/util/HFileSystem.java:40 How would this happen? We'd look at the path for the object and do a different fs in here based off that?
          src/main/java/org/apache/hadoop/hbase/util/HFileSystem.java:49 Won't the master use this fs too?
          src/main/java/org/apache/hadoop/hbase/util/HFileSystem.java:50 configuration
          src/main/java/org/apache/hadoop/hbase/util/HFileSystem.java:74 cool
          src/main/java/org/apache/hadoop/hbase/util/HFileSystem.java:107 Who would want this? Can we shut it down?
          src/main/java/org/apache/hadoop/hbase/util/HFileSystem.java:112 Its not the 'default' fs, it IS the fs?
          src/main/java/org/apache/hadoop/hbase/util/HFileSystem.java:167 cool
          src/main/java/org/apache/hadoop/hbase/util/HFileSystem.java:172 So we'll have nonrecursive w/ this method? I'm not sure I follow. This method will go away when filterfilesystem supports nonrecursive create?

          REVISION DETAIL
          https://reviews.facebook.net/D1521

          Show
          Phabricator added a comment - stack has commented on the revision " [jira] HBASE-5074 Support checksums in HBase block cache". Took a look at a little piece of the patch. It looks great. INLINE COMMENTS src/main/java/org/apache/hadoop/hbase/HConstants.java:601 It looks like this feature will be on by default. Good. src/main/java/org/apache/hadoop/hbase/util/HFileSystem.java:1 Should this class be in an fs package rather than in util? Nit. HFileSystem seems overly generic. Should it be HBaseFileSystem? Out of interest, is there a performance penalty that you know of going via FilterFileSystem? src/main/java/org/apache/hadoop/hbase/util/HFileSystem.java:40 How would this happen? We'd look at the path for the object and do a different fs in here based off that? src/main/java/org/apache/hadoop/hbase/util/HFileSystem.java:49 Won't the master use this fs too? src/main/java/org/apache/hadoop/hbase/util/HFileSystem.java:50 configuration src/main/java/org/apache/hadoop/hbase/util/HFileSystem.java:74 cool src/main/java/org/apache/hadoop/hbase/util/HFileSystem.java:107 Who would want this? Can we shut it down? src/main/java/org/apache/hadoop/hbase/util/HFileSystem.java:112 Its not the 'default' fs, it IS the fs? src/main/java/org/apache/hadoop/hbase/util/HFileSystem.java:167 cool src/main/java/org/apache/hadoop/hbase/util/HFileSystem.java:172 So we'll have nonrecursive w/ this method? I'm not sure I follow. This method will go away when filterfilesystem supports nonrecursive create? REVISION DETAIL https://reviews.facebook.net/D1521
          Hide
          Phabricator added a comment -

          stack has commented on the revision "[jira] HBASE-5074 Support checksums in HBase block cache".

          Took a look at a little piece of the patch. It looks great.

          INLINE COMMENTS
          src/main/java/org/apache/hadoop/hbase/HConstants.java:601 It looks like this feature will be on by default. Good.
          src/main/java/org/apache/hadoop/hbase/util/HFileSystem.java:1 Should this class be in an fs package rather than in util?

          Nit. HFileSystem seems overly generic. Should it be HBaseFileSystem?

          Out of interest, is there a performance penalty that you know of going via FilterFileSystem?
          src/main/java/org/apache/hadoop/hbase/util/HFileSystem.java:40 How would this happen? We'd look at the path for the object and do a different fs in here based off that?
          src/main/java/org/apache/hadoop/hbase/util/HFileSystem.java:49 Won't the master use this fs too?
          src/main/java/org/apache/hadoop/hbase/util/HFileSystem.java:50 configuration
          src/main/java/org/apache/hadoop/hbase/util/HFileSystem.java:74 cool
          src/main/java/org/apache/hadoop/hbase/util/HFileSystem.java:107 Who would want this? Can we shut it down?
          src/main/java/org/apache/hadoop/hbase/util/HFileSystem.java:112 Its not the 'default' fs, it IS the fs?
          src/main/java/org/apache/hadoop/hbase/util/HFileSystem.java:167 cool
          src/main/java/org/apache/hadoop/hbase/util/HFileSystem.java:172 So we'll have nonrecursive w/ this method? I'm not sure I follow. This method will go away when filterfilesystem supports nonrecursive create?

          REVISION DETAIL
          https://reviews.facebook.net/D1521

          Show
          Phabricator added a comment - stack has commented on the revision " [jira] HBASE-5074 Support checksums in HBase block cache". Took a look at a little piece of the patch. It looks great. INLINE COMMENTS src/main/java/org/apache/hadoop/hbase/HConstants.java:601 It looks like this feature will be on by default. Good. src/main/java/org/apache/hadoop/hbase/util/HFileSystem.java:1 Should this class be in an fs package rather than in util? Nit. HFileSystem seems overly generic. Should it be HBaseFileSystem? Out of interest, is there a performance penalty that you know of going via FilterFileSystem? src/main/java/org/apache/hadoop/hbase/util/HFileSystem.java:40 How would this happen? We'd look at the path for the object and do a different fs in here based off that? src/main/java/org/apache/hadoop/hbase/util/HFileSystem.java:49 Won't the master use this fs too? src/main/java/org/apache/hadoop/hbase/util/HFileSystem.java:50 configuration src/main/java/org/apache/hadoop/hbase/util/HFileSystem.java:74 cool src/main/java/org/apache/hadoop/hbase/util/HFileSystem.java:107 Who would want this? Can we shut it down? src/main/java/org/apache/hadoop/hbase/util/HFileSystem.java:112 Its not the 'default' fs, it IS the fs? src/main/java/org/apache/hadoop/hbase/util/HFileSystem.java:167 cool src/main/java/org/apache/hadoop/hbase/util/HFileSystem.java:172 So we'll have nonrecursive w/ this method? I'm not sure I follow. This method will go away when filterfilesystem supports nonrecursive create? REVISION DETAIL https://reviews.facebook.net/D1521
          Hide
          Phabricator added a comment -

          dhruba has commented on the revision "[jira] HBASE-5074 Support checksums in HBase block cache".

          INLINE COMMENTS
          src/main/java/org/apache/hadoop/hbase/io/hfile/HFileBlock.java:1545 This is the initialization code in the constructor that assumes that we always verify hbase checksums. In the next line, it will be set to false if the minor version is an old one. Similarly, If there is a HFileSystem and the called has voluntarily cleared hfs.useHBaseChecksum, then we respect the caller's wishes
          src/main/java/org/apache/hadoop/hbase/util/HFileSystem.java:1 I do not know of nay performance penalty. For hbase code, this wrapper is traversed only once when an HFile is opened of an HLog is created. Since the number of times we open/create a file is miniscule compared to the number of reads/writes to those files, the overhead (if any) should not show up in any benchmark. I will validate this on my cluster and report if I see any.
          src/main/java/org/apache/hadoop/hbase/util/HFileSystem.java:1 I do not yet see a package o.apache.hadoop.hbase.fs Do you want m to create it? There is a pre-exising class o.a.h.h.utils.FSUtils, that's why I created HFileSystem inside that package.
          src/main/java/org/apache/hadoop/hbase/util/HFileSystem.java:40 We would create a method HFileSystem.getLogFs(). The implementation of this method can open a new filesystem object (for storing transaction logs) Then, HRegionServer will pass in HFileSystem.getLogFs() into the constructor of HLog().
          src/main/java/org/apache/hadoop/hbase/util/HFileSystem.java:49 Currently, the only place HFileSystem is created is inside HRegionServer
          src/main/java/org/apache/hadoop/hbase/util/HFileSystem.java:107 You would see that readfs is the filesystem object that will be used to avoid checksum verification inside of hdfs.
          src/main/java/org/apache/hadoop/hbase/util/HFileSystem.java:172 The hadoop code base recently introduced the method FileSystem.createNonRecursive. But whoever added it to FileSystem forgot to add it to FilterFileSystem. Apache hadoop trunk should roll out a patch for this one soon.

          REVISION DETAIL
          https://reviews.facebook.net/D1521

          Show
          Phabricator added a comment - dhruba has commented on the revision " [jira] HBASE-5074 Support checksums in HBase block cache". INLINE COMMENTS src/main/java/org/apache/hadoop/hbase/io/hfile/HFileBlock.java:1545 This is the initialization code in the constructor that assumes that we always verify hbase checksums. In the next line, it will be set to false if the minor version is an old one. Similarly, If there is a HFileSystem and the called has voluntarily cleared hfs.useHBaseChecksum, then we respect the caller's wishes src/main/java/org/apache/hadoop/hbase/util/HFileSystem.java:1 I do not know of nay performance penalty. For hbase code, this wrapper is traversed only once when an HFile is opened of an HLog is created. Since the number of times we open/create a file is miniscule compared to the number of reads/writes to those files, the overhead (if any) should not show up in any benchmark. I will validate this on my cluster and report if I see any. src/main/java/org/apache/hadoop/hbase/util/HFileSystem.java:1 I do not yet see a package o.apache.hadoop.hbase.fs Do you want m to create it? There is a pre-exising class o.a.h.h.utils.FSUtils, that's why I created HFileSystem inside that package. src/main/java/org/apache/hadoop/hbase/util/HFileSystem.java:40 We would create a method HFileSystem.getLogFs(). The implementation of this method can open a new filesystem object (for storing transaction logs) Then, HRegionServer will pass in HFileSystem.getLogFs() into the constructor of HLog(). src/main/java/org/apache/hadoop/hbase/util/HFileSystem.java:49 Currently, the only place HFileSystem is created is inside HRegionServer src/main/java/org/apache/hadoop/hbase/util/HFileSystem.java:107 You would see that readfs is the filesystem object that will be used to avoid checksum verification inside of hdfs. src/main/java/org/apache/hadoop/hbase/util/HFileSystem.java:172 The hadoop code base recently introduced the method FileSystem.createNonRecursive. But whoever added it to FileSystem forgot to add it to FilterFileSystem. Apache hadoop trunk should roll out a patch for this one soon. REVISION DETAIL https://reviews.facebook.net/D1521
          Hide
          Phabricator added a comment -

          dhruba has commented on the revision "[jira] HBASE-5074 Support checksums in HBase block cache".

          INLINE COMMENTS
          src/main/java/org/apache/hadoop/hbase/io/hfile/HFileBlock.java:1545 This is the initialization code in the constructor that assumes that we always verify hbase checksums. In the next line, it will be set to false if the minor version is an old one. Similarly, If there is a HFileSystem and the called has voluntarily cleared hfs.useHBaseChecksum, then we respect the caller's wishes
          src/main/java/org/apache/hadoop/hbase/util/HFileSystem.java:1 I do not know of nay performance penalty. For hbase code, this wrapper is traversed only once when an HFile is opened of an HLog is created. Since the number of times we open/create a file is miniscule compared to the number of reads/writes to those files, the overhead (if any) should not show up in any benchmark. I will validate this on my cluster and report if I see any.
          src/main/java/org/apache/hadoop/hbase/util/HFileSystem.java:1 I do not yet see a package o.apache.hadoop.hbase.fs Do you want m to create it? There is a pre-exising class o.a.h.h.utils.FSUtils, that's why I created HFileSystem inside that package.
          src/main/java/org/apache/hadoop/hbase/util/HFileSystem.java:40 We would create a method HFileSystem.getLogFs(). The implementation of this method can open a new filesystem object (for storing transaction logs) Then, HRegionServer will pass in HFileSystem.getLogFs() into the constructor of HLog().
          src/main/java/org/apache/hadoop/hbase/util/HFileSystem.java:49 Currently, the only place HFileSystem is created is inside HRegionServer
          src/main/java/org/apache/hadoop/hbase/util/HFileSystem.java:107 You would see that readfs is the filesystem object that will be used to avoid checksum verification inside of hdfs.
          src/main/java/org/apache/hadoop/hbase/util/HFileSystem.java:172 The hadoop code base recently introduced the method FileSystem.createNonRecursive. But whoever added it to FileSystem forgot to add it to FilterFileSystem. Apache hadoop trunk should roll out a patch for this one soon.

          REVISION DETAIL
          https://reviews.facebook.net/D1521

          Show
          Phabricator added a comment - dhruba has commented on the revision " [jira] HBASE-5074 Support checksums in HBase block cache". INLINE COMMENTS src/main/java/org/apache/hadoop/hbase/io/hfile/HFileBlock.java:1545 This is the initialization code in the constructor that assumes that we always verify hbase checksums. In the next line, it will be set to false if the minor version is an old one. Similarly, If there is a HFileSystem and the called has voluntarily cleared hfs.useHBaseChecksum, then we respect the caller's wishes src/main/java/org/apache/hadoop/hbase/util/HFileSystem.java:1 I do not know of nay performance penalty. For hbase code, this wrapper is traversed only once when an HFile is opened of an HLog is created. Since the number of times we open/create a file is miniscule compared to the number of reads/writes to those files, the overhead (if any) should not show up in any benchmark. I will validate this on my cluster and report if I see any. src/main/java/org/apache/hadoop/hbase/util/HFileSystem.java:1 I do not yet see a package o.apache.hadoop.hbase.fs Do you want m to create it? There is a pre-exising class o.a.h.h.utils.FSUtils, that's why I created HFileSystem inside that package. src/main/java/org/apache/hadoop/hbase/util/HFileSystem.java:40 We would create a method HFileSystem.getLogFs(). The implementation of this method can open a new filesystem object (for storing transaction logs) Then, HRegionServer will pass in HFileSystem.getLogFs() into the constructor of HLog(). src/main/java/org/apache/hadoop/hbase/util/HFileSystem.java:49 Currently, the only place HFileSystem is created is inside HRegionServer src/main/java/org/apache/hadoop/hbase/util/HFileSystem.java:107 You would see that readfs is the filesystem object that will be used to avoid checksum verification inside of hdfs. src/main/java/org/apache/hadoop/hbase/util/HFileSystem.java:172 The hadoop code base recently introduced the method FileSystem.createNonRecursive. But whoever added it to FileSystem forgot to add it to FilterFileSystem. Apache hadoop trunk should roll out a patch for this one soon. REVISION DETAIL https://reviews.facebook.net/D1521
          Hide
          Phabricator added a comment -

          stack has commented on the revision "[jira] HBASE-5074 Support checksums in HBase block cache".

          INLINE COMMENTS
          src/main/java/org/apache/hadoop/hbase/util/HFileSystem.java:1 I'd suggest yes creating an fs package. Maybe FSUtils would move over but an fs package would seem to be a better location for a new FileSystem implementation than util.
          src/main/java/org/apache/hadoop/hbase/util/HFileSystem.java:49 Interesting. How does the master bootstrap the cluster then? It writes into the fs the root and meta regions?

          REVISION DETAIL
          https://reviews.facebook.net/D1521

          Show
          Phabricator added a comment - stack has commented on the revision " [jira] HBASE-5074 Support checksums in HBase block cache". INLINE COMMENTS src/main/java/org/apache/hadoop/hbase/util/HFileSystem.java:1 I'd suggest yes creating an fs package. Maybe FSUtils would move over but an fs package would seem to be a better location for a new FileSystem implementation than util. src/main/java/org/apache/hadoop/hbase/util/HFileSystem.java:49 Interesting. How does the master bootstrap the cluster then? It writes into the fs the root and meta regions? REVISION DETAIL https://reviews.facebook.net/D1521
          Hide
          Phabricator added a comment -

          stack has commented on the revision "[jira] HBASE-5074 Support checksums in HBase block cache".

          INLINE COMMENTS
          src/main/java/org/apache/hadoop/hbase/util/HFileSystem.java:1 I'd suggest yes creating an fs package. Maybe FSUtils would move over but an fs package would seem to be a better location for a new FileSystem implementation than util.
          src/main/java/org/apache/hadoop/hbase/util/HFileSystem.java:49 Interesting. How does the master bootstrap the cluster then? It writes into the fs the root and meta regions?

          REVISION DETAIL
          https://reviews.facebook.net/D1521

          Show
          Phabricator added a comment - stack has commented on the revision " [jira] HBASE-5074 Support checksums in HBase block cache". INLINE COMMENTS src/main/java/org/apache/hadoop/hbase/util/HFileSystem.java:1 I'd suggest yes creating an fs package. Maybe FSUtils would move over but an fs package would seem to be a better location for a new FileSystem implementation than util. src/main/java/org/apache/hadoop/hbase/util/HFileSystem.java:49 Interesting. How does the master bootstrap the cluster then? It writes into the fs the root and meta regions? REVISION DETAIL https://reviews.facebook.net/D1521
          Hide
          Phabricator added a comment -

          dhruba updated the revision "[jira] HBASE-5074 Support checksums in HBase block cache".
          Reviewers: mbautin

          Incorporated review feedback from Todd, Stack and TedYu.

          I made HFileBlock.readBlockData() thread-safe (still without using any
          locks because it is just a heuristic).I made the checksum encompass
          the values in the block header. HFileSystem is now in its own fs package.

          If any of you can review it one more time, that will be much appreciated.

          REVISION DETAIL
          https://reviews.facebook.net/D1521

          AFFECTED FILES
          src/test/java/org/apache/hadoop/hbase/HBaseTestingUtility.java
          src/test/java/org/apache/hadoop/hbase/regionserver/CreateRandomStoreFile.java
          src/test/java/org/apache/hadoop/hbase/regionserver/handler/TestCloseRegionHandler.java
          src/test/java/org/apache/hadoop/hbase/regionserver/TestFSErrorsExposed.java
          src/test/java/org/apache/hadoop/hbase/regionserver/HFileReadWriteTest.java
          src/test/java/org/apache/hadoop/hbase/regionserver/TestCompoundBloomFilter.java
          src/test/java/org/apache/hadoop/hbase/regionserver/TestHRegion.java
          src/test/java/org/apache/hadoop/hbase/regionserver/TestStoreFile.java
          src/test/java/org/apache/hadoop/hbase/io/hfile/TestChecksum.java
          src/test/java/org/apache/hadoop/hbase/io/hfile/TestHFileBlock.java
          src/test/java/org/apache/hadoop/hbase/io/hfile/TestCacheOnWrite.java
          src/test/java/org/apache/hadoop/hbase/io/hfile/TestHFileReaderV1.java
          src/test/java/org/apache/hadoop/hbase/io/hfile/TestFixedFileTrailer.java
          src/test/java/org/apache/hadoop/hbase/io/hfile/CacheTestUtils.java
          src/test/java/org/apache/hadoop/hbase/io/hfile/TestHFileBlockIndex.java
          src/test/java/org/apache/hadoop/hbase/io/hfile/TestHFileDataBlockEncoder.java
          src/test/java/org/apache/hadoop/hbase/io/hfile/TestHFileWriterV2.java
          src/test/java/org/apache/hadoop/hbase/io/hfile/TestHFileBlockCompatibility.java
          src/test/java/org/apache/hadoop/hbase/util/MockRegionServerServices.java
          src/main/java/org/apache/hadoop/hbase/HConstants.java
          src/main/java/org/apache/hadoop/hbase/fs
          src/main/java/org/apache/hadoop/hbase/fs/HFileSystem.java
          src/main/java/org/apache/hadoop/hbase/util/ChecksumType.java
          src/main/java/org/apache/hadoop/hbase/util/CompoundBloomFilter.java
          src/main/java/org/apache/hadoop/hbase/util/ChecksumFactory.java
          src/main/java/org/apache/hadoop/hbase/regionserver/RegionServerServices.java
          src/main/java/org/apache/hadoop/hbase/regionserver/metrics/RegionServerMetrics.java
          src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java
          src/main/java/org/apache/hadoop/hbase/regionserver/Store.java
          src/main/java/org/apache/hadoop/hbase/regionserver/StoreFile.java
          src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServer.java
          src/main/java/org/apache/hadoop/hbase/mapreduce/LoadIncrementalHFiles.java
          src/main/java/org/apache/hadoop/hbase/io/hfile/ChecksumUtil.java
          src/main/java/org/apache/hadoop/hbase/io/hfile/HFileBlock.java
          src/main/java/org/apache/hadoop/hbase/io/hfile/HFileDataBlockEncoderImpl.java
          src/main/java/org/apache/hadoop/hbase/io/hfile/HFileReaderV2.java
          src/main/java/org/apache/hadoop/hbase/io/hfile/HFile.java
          src/main/java/org/apache/hadoop/hbase/io/hfile/HFileDataBlockEncoder.java
          src/main/java/org/apache/hadoop/hbase/io/hfile/HFileWriterV2.java
          src/main/java/org/apache/hadoop/hbase/io/hfile/AbstractHFileReader.java
          src/main/java/org/apache/hadoop/hbase/io/hfile/NoOpDataBlockEncoder.java
          src/main/java/org/apache/hadoop/hbase/io/hfile/HFileReaderV1.java
          src/main/java/org/apache/hadoop/hbase/io/hfile/FixedFileTrailer.java
          src/main/java/org/apache/hadoop/hbase/io/hfile/HFileWriterV1.java

          Show
          Phabricator added a comment - dhruba updated the revision " [jira] HBASE-5074 Support checksums in HBase block cache". Reviewers: mbautin Incorporated review feedback from Todd, Stack and TedYu. I made HFileBlock.readBlockData() thread-safe (still without using any locks because it is just a heuristic).I made the checksum encompass the values in the block header. HFileSystem is now in its own fs package. If any of you can review it one more time, that will be much appreciated. REVISION DETAIL https://reviews.facebook.net/D1521 AFFECTED FILES src/test/java/org/apache/hadoop/hbase/HBaseTestingUtility.java src/test/java/org/apache/hadoop/hbase/regionserver/CreateRandomStoreFile.java src/test/java/org/apache/hadoop/hbase/regionserver/handler/TestCloseRegionHandler.java src/test/java/org/apache/hadoop/hbase/regionserver/TestFSErrorsExposed.java src/test/java/org/apache/hadoop/hbase/regionserver/HFileReadWriteTest.java src/test/java/org/apache/hadoop/hbase/regionserver/TestCompoundBloomFilter.java src/test/java/org/apache/hadoop/hbase/regionserver/TestHRegion.java src/test/java/org/apache/hadoop/hbase/regionserver/TestStoreFile.java src/test/java/org/apache/hadoop/hbase/io/hfile/TestChecksum.java src/test/java/org/apache/hadoop/hbase/io/hfile/TestHFileBlock.java src/test/java/org/apache/hadoop/hbase/io/hfile/TestCacheOnWrite.java src/test/java/org/apache/hadoop/hbase/io/hfile/TestHFileReaderV1.java src/test/java/org/apache/hadoop/hbase/io/hfile/TestFixedFileTrailer.java src/test/java/org/apache/hadoop/hbase/io/hfile/CacheTestUtils.java src/test/java/org/apache/hadoop/hbase/io/hfile/TestHFileBlockIndex.java src/test/java/org/apache/hadoop/hbase/io/hfile/TestHFileDataBlockEncoder.java src/test/java/org/apache/hadoop/hbase/io/hfile/TestHFileWriterV2.java src/test/java/org/apache/hadoop/hbase/io/hfile/TestHFileBlockCompatibility.java src/test/java/org/apache/hadoop/hbase/util/MockRegionServerServices.java src/main/java/org/apache/hadoop/hbase/HConstants.java src/main/java/org/apache/hadoop/hbase/fs src/main/java/org/apache/hadoop/hbase/fs/HFileSystem.java src/main/java/org/apache/hadoop/hbase/util/ChecksumType.java src/main/java/org/apache/hadoop/hbase/util/CompoundBloomFilter.java src/main/java/org/apache/hadoop/hbase/util/ChecksumFactory.java src/main/java/org/apache/hadoop/hbase/regionserver/RegionServerServices.java src/main/java/org/apache/hadoop/hbase/regionserver/metrics/RegionServerMetrics.java src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java src/main/java/org/apache/hadoop/hbase/regionserver/Store.java src/main/java/org/apache/hadoop/hbase/regionserver/StoreFile.java src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServer.java src/main/java/org/apache/hadoop/hbase/mapreduce/LoadIncrementalHFiles.java src/main/java/org/apache/hadoop/hbase/io/hfile/ChecksumUtil.java src/main/java/org/apache/hadoop/hbase/io/hfile/HFileBlock.java src/main/java/org/apache/hadoop/hbase/io/hfile/HFileDataBlockEncoderImpl.java src/main/java/org/apache/hadoop/hbase/io/hfile/HFileReaderV2.java src/main/java/org/apache/hadoop/hbase/io/hfile/HFile.java src/main/java/org/apache/hadoop/hbase/io/hfile/HFileDataBlockEncoder.java src/main/java/org/apache/hadoop/hbase/io/hfile/HFileWriterV2.java src/main/java/org/apache/hadoop/hbase/io/hfile/AbstractHFileReader.java src/main/java/org/apache/hadoop/hbase/io/hfile/NoOpDataBlockEncoder.java src/main/java/org/apache/hadoop/hbase/io/hfile/HFileReaderV1.java src/main/java/org/apache/hadoop/hbase/io/hfile/FixedFileTrailer.java src/main/java/org/apache/hadoop/hbase/io/hfile/HFileWriterV1.java
          Hide
          Phabricator added a comment -

          dhruba updated the revision "[jira] HBASE-5074 Support checksums in HBase block cache".
          Reviewers: mbautin

          Incorporated review feedback from Todd, Stack and TedYu.

          I made HFileBlock.readBlockData() thread-safe (still without using any
          locks because it is just a heuristic).I made the checksum encompass
          the values in the block header. HFileSystem is now in its own fs package.

          If any of you can review it one more time, that will be much appreciated.

          REVISION DETAIL
          https://reviews.facebook.net/D1521

          AFFECTED FILES
          src/test/java/org/apache/hadoop/hbase/HBaseTestingUtility.java
          src/test/java/org/apache/hadoop/hbase/regionserver/CreateRandomStoreFile.java
          src/test/java/org/apache/hadoop/hbase/regionserver/handler/TestCloseRegionHandler.java
          src/test/java/org/apache/hadoop/hbase/regionserver/TestFSErrorsExposed.java
          src/test/java/org/apache/hadoop/hbase/regionserver/HFileReadWriteTest.java
          src/test/java/org/apache/hadoop/hbase/regionserver/TestCompoundBloomFilter.java
          src/test/java/org/apache/hadoop/hbase/regionserver/TestHRegion.java
          src/test/java/org/apache/hadoop/hbase/regionserver/TestStoreFile.java
          src/test/java/org/apache/hadoop/hbase/io/hfile/TestChecksum.java
          src/test/java/org/apache/hadoop/hbase/io/hfile/TestHFileBlock.java
          src/test/java/org/apache/hadoop/hbase/io/hfile/TestCacheOnWrite.java
          src/test/java/org/apache/hadoop/hbase/io/hfile/TestHFileReaderV1.java
          src/test/java/org/apache/hadoop/hbase/io/hfile/TestFixedFileTrailer.java
          src/test/java/org/apache/hadoop/hbase/io/hfile/CacheTestUtils.java
          src/test/java/org/apache/hadoop/hbase/io/hfile/TestHFileBlockIndex.java
          src/test/java/org/apache/hadoop/hbase/io/hfile/TestHFileDataBlockEncoder.java
          src/test/java/org/apache/hadoop/hbase/io/hfile/TestHFileWriterV2.java
          src/test/java/org/apache/hadoop/hbase/io/hfile/TestHFileBlockCompatibility.java
          src/test/java/org/apache/hadoop/hbase/util/MockRegionServerServices.java
          src/main/java/org/apache/hadoop/hbase/HConstants.java
          src/main/java/org/apache/hadoop/hbase/fs
          src/main/java/org/apache/hadoop/hbase/fs/HFileSystem.java
          src/main/java/org/apache/hadoop/hbase/util/ChecksumType.java
          src/main/java/org/apache/hadoop/hbase/util/CompoundBloomFilter.java
          src/main/java/org/apache/hadoop/hbase/util/ChecksumFactory.java
          src/main/java/org/apache/hadoop/hbase/regionserver/RegionServerServices.java
          src/main/java/org/apache/hadoop/hbase/regionserver/metrics/RegionServerMetrics.java
          src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java
          src/main/java/org/apache/hadoop/hbase/regionserver/Store.java
          src/main/java/org/apache/hadoop/hbase/regionserver/StoreFile.java
          src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServer.java
          src/main/java/org/apache/hadoop/hbase/mapreduce/LoadIncrementalHFiles.java
          src/main/java/org/apache/hadoop/hbase/io/hfile/ChecksumUtil.java
          src/main/java/org/apache/hadoop/hbase/io/hfile/HFileBlock.java
          src/main/java/org/apache/hadoop/hbase/io/hfile/HFileDataBlockEncoderImpl.java
          src/main/java/org/apache/hadoop/hbase/io/hfile/HFileReaderV2.java
          src/main/java/org/apache/hadoop/hbase/io/hfile/HFile.java
          src/main/java/org/apache/hadoop/hbase/io/hfile/HFileDataBlockEncoder.java
          src/main/java/org/apache/hadoop/hbase/io/hfile/HFileWriterV2.java
          src/main/java/org/apache/hadoop/hbase/io/hfile/AbstractHFileReader.java
          src/main/java/org/apache/hadoop/hbase/io/hfile/NoOpDataBlockEncoder.java
          src/main/java/org/apache/hadoop/hbase/io/hfile/HFileReaderV1.java
          src/main/java/org/apache/hadoop/hbase/io/hfile/FixedFileTrailer.java
          src/main/java/org/apache/hadoop/hbase/io/hfile/HFileWriterV1.java

          Show
          Phabricator added a comment - dhruba updated the revision " [jira] HBASE-5074 Support checksums in HBase block cache". Reviewers: mbautin Incorporated review feedback from Todd, Stack and TedYu. I made HFileBlock.readBlockData() thread-safe (still without using any locks because it is just a heuristic).I made the checksum encompass the values in the block header. HFileSystem is now in its own fs package. If any of you can review it one more time, that will be much appreciated. REVISION DETAIL https://reviews.facebook.net/D1521 AFFECTED FILES src/test/java/org/apache/hadoop/hbase/HBaseTestingUtility.java src/test/java/org/apache/hadoop/hbase/regionserver/CreateRandomStoreFile.java src/test/java/org/apache/hadoop/hbase/regionserver/handler/TestCloseRegionHandler.java src/test/java/org/apache/hadoop/hbase/regionserver/TestFSErrorsExposed.java src/test/java/org/apache/hadoop/hbase/regionserver/HFileReadWriteTest.java src/test/java/org/apache/hadoop/hbase/regionserver/TestCompoundBloomFilter.java src/test/java/org/apache/hadoop/hbase/regionserver/TestHRegion.java src/test/java/org/apache/hadoop/hbase/regionserver/TestStoreFile.java src/test/java/org/apache/hadoop/hbase/io/hfile/TestChecksum.java src/test/java/org/apache/hadoop/hbase/io/hfile/TestHFileBlock.java src/test/java/org/apache/hadoop/hbase/io/hfile/TestCacheOnWrite.java src/test/java/org/apache/hadoop/hbase/io/hfile/TestHFileReaderV1.java src/test/java/org/apache/hadoop/hbase/io/hfile/TestFixedFileTrailer.java src/test/java/org/apache/hadoop/hbase/io/hfile/CacheTestUtils.java src/test/java/org/apache/hadoop/hbase/io/hfile/TestHFileBlockIndex.java src/test/java/org/apache/hadoop/hbase/io/hfile/TestHFileDataBlockEncoder.java src/test/java/org/apache/hadoop/hbase/io/hfile/TestHFileWriterV2.java src/test/java/org/apache/hadoop/hbase/io/hfile/TestHFileBlockCompatibility.java src/test/java/org/apache/hadoop/hbase/util/MockRegionServerServices.java src/main/java/org/apache/hadoop/hbase/HConstants.java src/main/java/org/apache/hadoop/hbase/fs src/main/java/org/apache/hadoop/hbase/fs/HFileSystem.java src/main/java/org/apache/hadoop/hbase/util/ChecksumType.java src/main/java/org/apache/hadoop/hbase/util/CompoundBloomFilter.java src/main/java/org/apache/hadoop/hbase/util/ChecksumFactory.java src/main/java/org/apache/hadoop/hbase/regionserver/RegionServerServices.java src/main/java/org/apache/hadoop/hbase/regionserver/metrics/RegionServerMetrics.java src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java src/main/java/org/apache/hadoop/hbase/regionserver/Store.java src/main/java/org/apache/hadoop/hbase/regionserver/StoreFile.java src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServer.java src/main/java/org/apache/hadoop/hbase/mapreduce/LoadIncrementalHFiles.java src/main/java/org/apache/hadoop/hbase/io/hfile/ChecksumUtil.java src/main/java/org/apache/hadoop/hbase/io/hfile/HFileBlock.java src/main/java/org/apache/hadoop/hbase/io/hfile/HFileDataBlockEncoderImpl.java src/main/java/org/apache/hadoop/hbase/io/hfile/HFileReaderV2.java src/main/java/org/apache/hadoop/hbase/io/hfile/HFile.java src/main/java/org/apache/hadoop/hbase/io/hfile/HFileDataBlockEncoder.java src/main/java/org/apache/hadoop/hbase/io/hfile/HFileWriterV2.java src/main/java/org/apache/hadoop/hbase/io/hfile/AbstractHFileReader.java src/main/java/org/apache/hadoop/hbase/io/hfile/NoOpDataBlockEncoder.java src/main/java/org/apache/hadoop/hbase/io/hfile/HFileReaderV1.java src/main/java/org/apache/hadoop/hbase/io/hfile/FixedFileTrailer.java src/main/java/org/apache/hadoop/hbase/io/hfile/HFileWriterV1.java
          Hide
          Todd Lipcon added a comment -

          Hey Dhruba,

          I didn't look at the new rev yet, but does it also do checksums on the
          HFile header itself? ie the parts of the HFile that don't fall inside any
          block? If not, we should continue to use the checksummed FS when we open
          the hfile.

          -Todd

          On Wed, Feb 15, 2012 at 9:55 PM, dhruba (Dhruba Borthakur) <

          Show
          Todd Lipcon added a comment - Hey Dhruba, I didn't look at the new rev yet, but does it also do checksums on the HFile header itself? ie the parts of the HFile that don't fall inside any block? If not, we should continue to use the checksummed FS when we open the hfile. -Todd On Wed, Feb 15, 2012 at 9:55 PM, dhruba (Dhruba Borthakur) <
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12514759/D1521.6.patch
          against trunk revision .

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 55 new or modified tests.

          -1 javadoc. The javadoc tool appears to have generated -132 warning messages.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          -1 findbugs. The patch appears to introduce 161 new Findbugs (version 1.3.9) warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          +1 core tests. The patch passed unit tests in .

          Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/969//testReport/
          Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/969//artifact/trunk/patchprocess/newPatchFindbugsWarnings.html
          Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/969//console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12514759/D1521.6.patch against trunk revision . +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 55 new or modified tests. -1 javadoc. The javadoc tool appears to have generated -132 warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. -1 findbugs. The patch appears to introduce 161 new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. +1 core tests. The patch passed unit tests in . Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/969//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/969//artifact/trunk/patchprocess/newPatchFindbugsWarnings.html Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/969//console This message is automatically generated.
          Hide
          dhruba borthakur added a comment -

          Hi Todd, thanks for continuing to review this patch. Yes, the latest version that I uploaded uses hdfs checksum verifications while reading the Hfile trailer.

          Show
          dhruba borthakur added a comment - Hi Todd, thanks for continuing to review this patch. Yes, the latest version that I uploaded uses hdfs checksum verifications while reading the Hfile trailer.
          Hide
          Phabricator added a comment -

          stack has commented on the revision "[jira] HBASE-5074 Support checksums in HBase block cache".

          I got about 15% through. Will do rest later. This stuff is great.

          INLINE COMMENTS
          src/main/java/org/apache/hadoop/hbase/HConstants.java:605 Nice doc. Lets hoist up into the reference manual on commit.
          src/main/java/org/apache/hadoop/hbase/fs/HFileSystem.java:1 Good. I think its better having it in here.
          src/main/java/org/apache/hadoop/hbase/fs/HFileSystem.java:115 I see we use this writing the WAL. Reading we'll use whatever the readfs? Do we need to expose this? Or the getReadRS even?

          Or is it that you want different fs's for read and write? If so, should this method be called getWriteFS?
          src/main/java/org/apache/hadoop/hbase/fs/HFileSystem.java:129 Post creation, invoking this method would have no effect? If so, remove, and make this data member final?
          src/main/java/org/apache/hadoop/hbase/io/hfile/AbstractHFileReader.java:44 Why change this comment? Do we care how it does checksumming?
          src/main/java/org/apache/hadoop/hbase/io/hfile/AbstractHFileReader.java:49 Yeah, I wonder if upper tiers need worry about this stuff? Whether its checksummed or not? Should they just be talking about readfs vs writefs? And then its up to the configuration as to what the underlying fs does (in this case its just turning off hdfs checksumming). Looks like actual checksumming is over in HFileBlock... maybe HFile itself doesn't need to be concerned w/ checksumming?

          No biggie. Just a comment.

          REVISION DETAIL
          https://reviews.facebook.net/D1521

          Show
          Phabricator added a comment - stack has commented on the revision " [jira] HBASE-5074 Support checksums in HBase block cache". I got about 15% through. Will do rest later. This stuff is great. INLINE COMMENTS src/main/java/org/apache/hadoop/hbase/HConstants.java:605 Nice doc. Lets hoist up into the reference manual on commit. src/main/java/org/apache/hadoop/hbase/fs/HFileSystem.java:1 Good. I think its better having it in here. src/main/java/org/apache/hadoop/hbase/fs/HFileSystem.java:115 I see we use this writing the WAL. Reading we'll use whatever the readfs? Do we need to expose this? Or the getReadRS even? Or is it that you want different fs's for read and write? If so, should this method be called getWriteFS? src/main/java/org/apache/hadoop/hbase/fs/HFileSystem.java:129 Post creation, invoking this method would have no effect? If so, remove, and make this data member final? src/main/java/org/apache/hadoop/hbase/io/hfile/AbstractHFileReader.java:44 Why change this comment? Do we care how it does checksumming? src/main/java/org/apache/hadoop/hbase/io/hfile/AbstractHFileReader.java:49 Yeah, I wonder if upper tiers need worry about this stuff? Whether its checksummed or not? Should they just be talking about readfs vs writefs? And then its up to the configuration as to what the underlying fs does (in this case its just turning off hdfs checksumming). Looks like actual checksumming is over in HFileBlock... maybe HFile itself doesn't need to be concerned w/ checksumming? No biggie. Just a comment. REVISION DETAIL https://reviews.facebook.net/D1521
          Hide
          Phabricator added a comment -

          stack has commented on the revision "[jira] HBASE-5074 Support checksums in HBase block cache".

          I got about 15% through. Will do rest later. This stuff is great.

          INLINE COMMENTS
          src/main/java/org/apache/hadoop/hbase/HConstants.java:605 Nice doc. Lets hoist up into the reference manual on commit.
          src/main/java/org/apache/hadoop/hbase/fs/HFileSystem.java:1 Good. I think its better having it in here.
          src/main/java/org/apache/hadoop/hbase/fs/HFileSystem.java:115 I see we use this writing the WAL. Reading we'll use whatever the readfs? Do we need to expose this? Or the getReadRS even?

          Or is it that you want different fs's for read and write? If so, should this method be called getWriteFS?
          src/main/java/org/apache/hadoop/hbase/fs/HFileSystem.java:129 Post creation, invoking this method would have no effect? If so, remove, and make this data member final?
          src/main/java/org/apache/hadoop/hbase/io/hfile/AbstractHFileReader.java:44 Why change this comment? Do we care how it does checksumming?
          src/main/java/org/apache/hadoop/hbase/io/hfile/AbstractHFileReader.java:49 Yeah, I wonder if upper tiers need worry about this stuff? Whether its checksummed or not? Should they just be talking about readfs vs writefs? And then its up to the configuration as to what the underlying fs does (in this case its just turning off hdfs checksumming). Looks like actual checksumming is over in HFileBlock... maybe HFile itself doesn't need to be concerned w/ checksumming?

          No biggie. Just a comment.

          REVISION DETAIL
          https://reviews.facebook.net/D1521

          Show
          Phabricator added a comment - stack has commented on the revision " [jira] HBASE-5074 Support checksums in HBase block cache". I got about 15% through. Will do rest later. This stuff is great. INLINE COMMENTS src/main/java/org/apache/hadoop/hbase/HConstants.java:605 Nice doc. Lets hoist up into the reference manual on commit. src/main/java/org/apache/hadoop/hbase/fs/HFileSystem.java:1 Good. I think its better having it in here. src/main/java/org/apache/hadoop/hbase/fs/HFileSystem.java:115 I see we use this writing the WAL. Reading we'll use whatever the readfs? Do we need to expose this? Or the getReadRS even? Or is it that you want different fs's for read and write? If so, should this method be called getWriteFS? src/main/java/org/apache/hadoop/hbase/fs/HFileSystem.java:129 Post creation, invoking this method would have no effect? If so, remove, and make this data member final? src/main/java/org/apache/hadoop/hbase/io/hfile/AbstractHFileReader.java:44 Why change this comment? Do we care how it does checksumming? src/main/java/org/apache/hadoop/hbase/io/hfile/AbstractHFileReader.java:49 Yeah, I wonder if upper tiers need worry about this stuff? Whether its checksummed or not? Should they just be talking about readfs vs writefs? And then its up to the configuration as to what the underlying fs does (in this case its just turning off hdfs checksumming). Looks like actual checksumming is over in HFileBlock... maybe HFile itself doesn't need to be concerned w/ checksumming? No biggie. Just a comment. REVISION DETAIL https://reviews.facebook.net/D1521
          Hide
          Phabricator added a comment -

          stack has commented on the revision "[jira] HBASE-5074 Support checksums in HBase block cache".

          Got to the 20% stage.

          Whats the status of this patch Dhruba? Are you running it anywhere?

          INLINE COMMENTS
          src/main/java/org/apache/hadoop/hbase/io/hfile/ChecksumUtil.java:46 Great comments
          src/main/java/org/apache/hadoop/hbase/io/hfile/ChecksumUtil.java:84 The value returned is a long. Why convert to an int?
          src/main/java/org/apache/hadoop/hbase/io/hfile/ChecksumUtil.java:18 I think its good that this utility is in this pacage since it seems particular to this package. At first I thought it general utility... there is some but mostly its about this feature it seems.
          src/main/java/org/apache/hadoop/hbase/io/hfile/HFile.java:205 Do you want to doc that a get resets count to zero?
          src/main/java/org/apache/hadoop/hbase/io/hfile/HFile.java:462 Yeah, its hard to contain the checksumming feature to just a few places; it leaks out all over io.hfile. Thats fine.

          REVISION DETAIL
          https://reviews.facebook.net/D1521

          Show
          Phabricator added a comment - stack has commented on the revision " [jira] HBASE-5074 Support checksums in HBase block cache". Got to the 20% stage. Whats the status of this patch Dhruba? Are you running it anywhere? INLINE COMMENTS src/main/java/org/apache/hadoop/hbase/io/hfile/ChecksumUtil.java:46 Great comments src/main/java/org/apache/hadoop/hbase/io/hfile/ChecksumUtil.java:84 The value returned is a long. Why convert to an int? src/main/java/org/apache/hadoop/hbase/io/hfile/ChecksumUtil.java:18 I think its good that this utility is in this pacage since it seems particular to this package. At first I thought it general utility... there is some but mostly its about this feature it seems. src/main/java/org/apache/hadoop/hbase/io/hfile/HFile.java:205 Do you want to doc that a get resets count to zero? src/main/java/org/apache/hadoop/hbase/io/hfile/HFile.java:462 Yeah, its hard to contain the checksumming feature to just a few places; it leaks out all over io.hfile. Thats fine. REVISION DETAIL https://reviews.facebook.net/D1521
          Hide
          Phabricator added a comment -

          stack has commented on the revision "[jira] HBASE-5074 Support checksums in HBase block cache".

          Got to the 20% stage.

          Whats the status of this patch Dhruba? Are you running it anywhere?

          INLINE COMMENTS
          src/main/java/org/apache/hadoop/hbase/io/hfile/ChecksumUtil.java:46 Great comments
          src/main/java/org/apache/hadoop/hbase/io/hfile/ChecksumUtil.java:84 The value returned is a long. Why convert to an int?
          src/main/java/org/apache/hadoop/hbase/io/hfile/ChecksumUtil.java:18 I think its good that this utility is in this pacage since it seems particular to this package. At first I thought it general utility... there is some but mostly its about this feature it seems.
          src/main/java/org/apache/hadoop/hbase/io/hfile/HFile.java:205 Do you want to doc that a get resets count to zero?
          src/main/java/org/apache/hadoop/hbase/io/hfile/HFile.java:462 Yeah, its hard to contain the checksumming feature to just a few places; it leaks out all over io.hfile. Thats fine.

          REVISION DETAIL
          https://reviews.facebook.net/D1521

          Show
          Phabricator added a comment - stack has commented on the revision " [jira] HBASE-5074 Support checksums in HBase block cache". Got to the 20% stage. Whats the status of this patch Dhruba? Are you running it anywhere? INLINE COMMENTS src/main/java/org/apache/hadoop/hbase/io/hfile/ChecksumUtil.java:46 Great comments src/main/java/org/apache/hadoop/hbase/io/hfile/ChecksumUtil.java:84 The value returned is a long. Why convert to an int? src/main/java/org/apache/hadoop/hbase/io/hfile/ChecksumUtil.java:18 I think its good that this utility is in this pacage since it seems particular to this package. At first I thought it general utility... there is some but mostly its about this feature it seems. src/main/java/org/apache/hadoop/hbase/io/hfile/HFile.java:205 Do you want to doc that a get resets count to zero? src/main/java/org/apache/hadoop/hbase/io/hfile/HFile.java:462 Yeah, its hard to contain the checksumming feature to just a few places; it leaks out all over io.hfile. Thats fine. REVISION DETAIL https://reviews.facebook.net/D1521
          Hide
          Phabricator added a comment -

          dhruba has commented on the revision "[jira] HBASE-5074 Support checksums in HBase block cache".

          INLINE COMMENTS
          src/main/java/org/apache/hadoop/hbase/fs/HFileSystem.java:115 ideally, we need two different fs. The first fs is for writing and reading-with-hdfs-checksums. The other fs is for reading-without-hdfs.

          src/main/java/org/apache/hadoop/hbase/fs/HFileSystem.java:129 done
          src/main/java/org/apache/hadoop/hbase/io/hfile/AbstractHFileReader.java:49 The HFile layer is the one that is responsible for opening a file for reading. Then the multi-threaded HFileBlockLayer uses those FSDataInputStream to pread data from HDFS. So, I need to make the HFile layer open two file descriptors for the same file, both for reading purposes... one which checksum and the other without checksums
          src/main/java/org/apache/hadoop/hbase/io/hfile/AbstractHFileReader.java:44 This is a protected member, so users of this class are not concerned on what this is. If you have a better structure on how to organize this one, please do let me know
          src/main/java/org/apache/hadoop/hbase/io/hfile/ChecksumUtil.java:84 The Checksum API returns a long. But actual implementations like CRC32, CRC32C, etc all return an int.

          Also, the Hadoop checksum implementation also uses a 4 byte value. If you think that we should store 8 byte checksums, I can do that. But for the common case, we will be wasting 4 bytes in the header for every checksum chunk
          src/main/java/org/apache/hadoop/hbase/io/hfile/HFile.java:205 done

          REVISION DETAIL
          https://reviews.facebook.net/D1521

          Show
          Phabricator added a comment - dhruba has commented on the revision " [jira] HBASE-5074 Support checksums in HBase block cache". INLINE COMMENTS src/main/java/org/apache/hadoop/hbase/fs/HFileSystem.java:115 ideally, we need two different fs. The first fs is for writing and reading-with-hdfs-checksums. The other fs is for reading-without-hdfs. src/main/java/org/apache/hadoop/hbase/fs/HFileSystem.java:129 done src/main/java/org/apache/hadoop/hbase/io/hfile/AbstractHFileReader.java:49 The HFile layer is the one that is responsible for opening a file for reading. Then the multi-threaded HFileBlockLayer uses those FSDataInputStream to pread data from HDFS. So, I need to make the HFile layer open two file descriptors for the same file, both for reading purposes... one which checksum and the other without checksums src/main/java/org/apache/hadoop/hbase/io/hfile/AbstractHFileReader.java:44 This is a protected member, so users of this class are not concerned on what this is. If you have a better structure on how to organize this one, please do let me know src/main/java/org/apache/hadoop/hbase/io/hfile/ChecksumUtil.java:84 The Checksum API returns a long. But actual implementations like CRC32, CRC32C, etc all return an int. Also, the Hadoop checksum implementation also uses a 4 byte value. If you think that we should store 8 byte checksums, I can do that. But for the common case, we will be wasting 4 bytes in the header for every checksum chunk src/main/java/org/apache/hadoop/hbase/io/hfile/HFile.java:205 done REVISION DETAIL https://reviews.facebook.net/D1521
          Hide
          Phabricator added a comment -

          dhruba updated the revision "[jira] HBASE-5074 Support checksums in HBase block cache".
          Reviewers: mbautin

          Incorporated Stacks's review comments.

          REVISION DETAIL
          https://reviews.facebook.net/D1521

          AFFECTED FILES
          src/test/java/org/apache/hadoop/hbase/HBaseTestingUtility.java
          src/test/java/org/apache/hadoop/hbase/regionserver/CreateRandomStoreFile.java
          src/test/java/org/apache/hadoop/hbase/regionserver/TestFSErrorsExposed.java
          src/test/java/org/apache/hadoop/hbase/regionserver/handler/TestCloseRegionHandler.java
          src/test/java/org/apache/hadoop/hbase/regionserver/HFileReadWriteTest.java
          src/test/java/org/apache/hadoop/hbase/regionserver/TestCompoundBloomFilter.java
          src/test/java/org/apache/hadoop/hbase/regionserver/TestHRegion.java
          src/test/java/org/apache/hadoop/hbase/regionserver/TestStoreFile.java
          src/test/java/org/apache/hadoop/hbase/io/hfile/TestHFileBlock.java
          src/test/java/org/apache/hadoop/hbase/io/hfile/TestChecksum.java
          src/test/java/org/apache/hadoop/hbase/io/hfile/TestCacheOnWrite.java
          src/test/java/org/apache/hadoop/hbase/io/hfile/TestHFileReaderV1.java
          src/test/java/org/apache/hadoop/hbase/io/hfile/TestFixedFileTrailer.java
          src/test/java/org/apache/hadoop/hbase/io/hfile/CacheTestUtils.java
          src/test/java/org/apache/hadoop/hbase/io/hfile/TestHFileBlockIndex.java
          src/test/java/org/apache/hadoop/hbase/io/hfile/TestHFileDataBlockEncoder.java
          src/test/java/org/apache/hadoop/hbase/io/hfile/TestHFileWriterV2.java
          src/test/java/org/apache/hadoop/hbase/io/hfile/TestHFileBlockCompatibility.java
          src/test/java/org/apache/hadoop/hbase/util/MockRegionServerServices.java
          src/main/java/org/apache/hadoop/hbase/HConstants.java
          src/main/java/org/apache/hadoop/hbase/fs
          src/main/java/org/apache/hadoop/hbase/fs/HFileSystem.java
          src/main/java/org/apache/hadoop/hbase/util/ChecksumType.java
          src/main/java/org/apache/hadoop/hbase/util/CompoundBloomFilter.java
          src/main/java/org/apache/hadoop/hbase/util/ChecksumFactory.java
          src/main/java/org/apache/hadoop/hbase/regionserver/RegionServerServices.java
          src/main/java/org/apache/hadoop/hbase/regionserver/metrics/RegionServerMetrics.java
          src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java
          src/main/java/org/apache/hadoop/hbase/regionserver/StoreFile.java
          src/main/java/org/apache/hadoop/hbase/regionserver/Store.java
          src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServer.java
          src/main/java/org/apache/hadoop/hbase/mapreduce/LoadIncrementalHFiles.java
          src/main/java/org/apache/hadoop/hbase/io/hfile/ChecksumUtil.java
          src/main/java/org/apache/hadoop/hbase/io/hfile/HFileBlock.java
          src/main/java/org/apache/hadoop/hbase/io/hfile/HFileDataBlockEncoderImpl.java
          src/main/java/org/apache/hadoop/hbase/io/hfile/HFileReaderV2.java
          src/main/java/org/apache/hadoop/hbase/io/hfile/HFile.java
          src/main/java/org/apache/hadoop/hbase/io/hfile/HFileWriterV2.java
          src/main/java/org/apache/hadoop/hbase/io/hfile/HFileDataBlockEncoder.java
          src/main/java/org/apache/hadoop/hbase/io/hfile/AbstractHFileReader.java
          src/main/java/org/apache/hadoop/hbase/io/hfile/NoOpDataBlockEncoder.java
          src/main/java/org/apache/hadoop/hbase/io/hfile/HFileReaderV1.java
          src/main/java/org/apache/hadoop/hbase/io/hfile/FixedFileTrailer.java
          src/main/java/org/apache/hadoop/hbase/io/hfile/HFileWriterV1.java

          Show
          Phabricator added a comment - dhruba updated the revision " [jira] HBASE-5074 Support checksums in HBase block cache". Reviewers: mbautin Incorporated Stacks's review comments. REVISION DETAIL https://reviews.facebook.net/D1521 AFFECTED FILES src/test/java/org/apache/hadoop/hbase/HBaseTestingUtility.java src/test/java/org/apache/hadoop/hbase/regionserver/CreateRandomStoreFile.java src/test/java/org/apache/hadoop/hbase/regionserver/TestFSErrorsExposed.java src/test/java/org/apache/hadoop/hbase/regionserver/handler/TestCloseRegionHandler.java src/test/java/org/apache/hadoop/hbase/regionserver/HFileReadWriteTest.java src/test/java/org/apache/hadoop/hbase/regionserver/TestCompoundBloomFilter.java src/test/java/org/apache/hadoop/hbase/regionserver/TestHRegion.java src/test/java/org/apache/hadoop/hbase/regionserver/TestStoreFile.java src/test/java/org/apache/hadoop/hbase/io/hfile/TestHFileBlock.java src/test/java/org/apache/hadoop/hbase/io/hfile/TestChecksum.java src/test/java/org/apache/hadoop/hbase/io/hfile/TestCacheOnWrite.java src/test/java/org/apache/hadoop/hbase/io/hfile/TestHFileReaderV1.java src/test/java/org/apache/hadoop/hbase/io/hfile/TestFixedFileTrailer.java src/test/java/org/apache/hadoop/hbase/io/hfile/CacheTestUtils.java src/test/java/org/apache/hadoop/hbase/io/hfile/TestHFileBlockIndex.java src/test/java/org/apache/hadoop/hbase/io/hfile/TestHFileDataBlockEncoder.java src/test/java/org/apache/hadoop/hbase/io/hfile/TestHFileWriterV2.java src/test/java/org/apache/hadoop/hbase/io/hfile/TestHFileBlockCompatibility.java src/test/java/org/apache/hadoop/hbase/util/MockRegionServerServices.java src/main/java/org/apache/hadoop/hbase/HConstants.java src/main/java/org/apache/hadoop/hbase/fs src/main/java/org/apache/hadoop/hbase/fs/HFileSystem.java src/main/java/org/apache/hadoop/hbase/util/ChecksumType.java src/main/java/org/apache/hadoop/hbase/util/CompoundBloomFilter.java src/main/java/org/apache/hadoop/hbase/util/ChecksumFactory.java src/main/java/org/apache/hadoop/hbase/regionserver/RegionServerServices.java src/main/java/org/apache/hadoop/hbase/regionserver/metrics/RegionServerMetrics.java src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java src/main/java/org/apache/hadoop/hbase/regionserver/StoreFile.java src/main/java/org/apache/hadoop/hbase/regionserver/Store.java src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServer.java src/main/java/org/apache/hadoop/hbase/mapreduce/LoadIncrementalHFiles.java src/main/java/org/apache/hadoop/hbase/io/hfile/ChecksumUtil.java src/main/java/org/apache/hadoop/hbase/io/hfile/HFileBlock.java src/main/java/org/apache/hadoop/hbase/io/hfile/HFileDataBlockEncoderImpl.java src/main/java/org/apache/hadoop/hbase/io/hfile/HFileReaderV2.java src/main/java/org/apache/hadoop/hbase/io/hfile/HFile.java src/main/java/org/apache/hadoop/hbase/io/hfile/HFileWriterV2.java src/main/java/org/apache/hadoop/hbase/io/hfile/HFileDataBlockEncoder.java src/main/java/org/apache/hadoop/hbase/io/hfile/AbstractHFileReader.java src/main/java/org/apache/hadoop/hbase/io/hfile/NoOpDataBlockEncoder.java src/main/java/org/apache/hadoop/hbase/io/hfile/HFileReaderV1.java src/main/java/org/apache/hadoop/hbase/io/hfile/FixedFileTrailer.java src/main/java/org/apache/hadoop/hbase/io/hfile/HFileWriterV1.java
          Hide
          Phabricator added a comment -

          dhruba updated the revision "[jira] HBASE-5074 Support checksums in HBase block cache".
          Reviewers: mbautin

          Incorporated Stacks's review comments.

          REVISION DETAIL
          https://reviews.facebook.net/D1521

          AFFECTED FILES
          src/test/java/org/apache/hadoop/hbase/HBaseTestingUtility.java
          src/test/java/org/apache/hadoop/hbase/regionserver/CreateRandomStoreFile.java
          src/test/java/org/apache/hadoop/hbase/regionserver/TestFSErrorsExposed.java
          src/test/java/org/apache/hadoop/hbase/regionserver/handler/TestCloseRegionHandler.java
          src/test/java/org/apache/hadoop/hbase/regionserver/HFileReadWriteTest.java
          src/test/java/org/apache/hadoop/hbase/regionserver/TestCompoundBloomFilter.java
          src/test/java/org/apache/hadoop/hbase/regionserver/TestHRegion.java
          src/test/java/org/apache/hadoop/hbase/regionserver/TestStoreFile.java
          src/test/java/org/apache/hadoop/hbase/io/hfile/TestHFileBlock.java
          src/test/java/org/apache/hadoop/hbase/io/hfile/TestChecksum.java
          src/test/java/org/apache/hadoop/hbase/io/hfile/TestCacheOnWrite.java
          src/test/java/org/apache/hadoop/hbase/io/hfile/TestHFileReaderV1.java
          src/test/java/org/apache/hadoop/hbase/io/hfile/TestFixedFileTrailer.java
          src/test/java/org/apache/hadoop/hbase/io/hfile/CacheTestUtils.java
          src/test/java/org/apache/hadoop/hbase/io/hfile/TestHFileBlockIndex.java
          src/test/java/org/apache/hadoop/hbase/io/hfile/TestHFileDataBlockEncoder.java
          src/test/java/org/apache/hadoop/hbase/io/hfile/TestHFileWriterV2.java
          src/test/java/org/apache/hadoop/hbase/io/hfile/TestHFileBlockCompatibility.java
          src/test/java/org/apache/hadoop/hbase/util/MockRegionServerServices.java
          src/main/java/org/apache/hadoop/hbase/HConstants.java
          src/main/java/org/apache/hadoop/hbase/fs
          src/main/java/org/apache/hadoop/hbase/fs/HFileSystem.java
          src/main/java/org/apache/hadoop/hbase/util/ChecksumType.java
          src/main/java/org/apache/hadoop/hbase/util/CompoundBloomFilter.java
          src/main/java/org/apache/hadoop/hbase/util/ChecksumFactory.java
          src/main/java/org/apache/hadoop/hbase/regionserver/RegionServerServices.java
          src/main/java/org/apache/hadoop/hbase/regionserver/metrics/RegionServerMetrics.java
          src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java
          src/main/java/org/apache/hadoop/hbase/regionserver/StoreFile.java
          src/main/java/org/apache/hadoop/hbase/regionserver/Store.java
          src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServer.java
          src/main/java/org/apache/hadoop/hbase/mapreduce/LoadIncrementalHFiles.java
          src/main/java/org/apache/hadoop/hbase/io/hfile/ChecksumUtil.java
          src/main/java/org/apache/hadoop/hbase/io/hfile/HFileBlock.java
          src/main/java/org/apache/hadoop/hbase/io/hfile/HFileDataBlockEncoderImpl.java
          src/main/java/org/apache/hadoop/hbase/io/hfile/HFileReaderV2.java
          src/main/java/org/apache/hadoop/hbase/io/hfile/HFile.java
          src/main/java/org/apache/hadoop/hbase/io/hfile/HFileWriterV2.java
          src/main/java/org/apache/hadoop/hbase/io/hfile/HFileDataBlockEncoder.java
          src/main/java/org/apache/hadoop/hbase/io/hfile/AbstractHFileReader.java
          src/main/java/org/apache/hadoop/hbase/io/hfile/NoOpDataBlockEncoder.java
          src/main/java/org/apache/hadoop/hbase/io/hfile/HFileReaderV1.java
          src/main/java/org/apache/hadoop/hbase/io/hfile/FixedFileTrailer.java
          src/main/java/org/apache/hadoop/hbase/io/hfile/HFileWriterV1.java

          Show
          Phabricator added a comment - dhruba updated the revision " [jira] HBASE-5074 Support checksums in HBase block cache". Reviewers: mbautin Incorporated Stacks's review comments. REVISION DETAIL https://reviews.facebook.net/D1521 AFFECTED FILES src/test/java/org/apache/hadoop/hbase/HBaseTestingUtility.java src/test/java/org/apache/hadoop/hbase/regionserver/CreateRandomStoreFile.java src/test/java/org/apache/hadoop/hbase/regionserver/TestFSErrorsExposed.java src/test/java/org/apache/hadoop/hbase/regionserver/handler/TestCloseRegionHandler.java src/test/java/org/apache/hadoop/hbase/regionserver/HFileReadWriteTest.java src/test/java/org/apache/hadoop/hbase/regionserver/TestCompoundBloomFilter.java src/test/java/org/apache/hadoop/hbase/regionserver/TestHRegion.java src/test/java/org/apache/hadoop/hbase/regionserver/TestStoreFile.java src/test/java/org/apache/hadoop/hbase/io/hfile/TestHFileBlock.java src/test/java/org/apache/hadoop/hbase/io/hfile/TestChecksum.java src/test/java/org/apache/hadoop/hbase/io/hfile/TestCacheOnWrite.java src/test/java/org/apache/hadoop/hbase/io/hfile/TestHFileReaderV1.java src/test/java/org/apache/hadoop/hbase/io/hfile/TestFixedFileTrailer.java src/test/java/org/apache/hadoop/hbase/io/hfile/CacheTestUtils.java src/test/java/org/apache/hadoop/hbase/io/hfile/TestHFileBlockIndex.java src/test/java/org/apache/hadoop/hbase/io/hfile/TestHFileDataBlockEncoder.java src/test/java/org/apache/hadoop/hbase/io/hfile/TestHFileWriterV2.java src/test/java/org/apache/hadoop/hbase/io/hfile/TestHFileBlockCompatibility.java src/test/java/org/apache/hadoop/hbase/util/MockRegionServerServices.java src/main/java/org/apache/hadoop/hbase/HConstants.java src/main/java/org/apache/hadoop/hbase/fs src/main/java/org/apache/hadoop/hbase/fs/HFileSystem.java src/main/java/org/apache/hadoop/hbase/util/ChecksumType.java src/main/java/org/apache/hadoop/hbase/util/CompoundBloomFilter.java src/main/java/org/apache/hadoop/hbase/util/ChecksumFactory.java src/main/java/org/apache/hadoop/hbase/regionserver/RegionServerServices.java src/main/java/org/apache/hadoop/hbase/regionserver/metrics/RegionServerMetrics.java src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java src/main/java/org/apache/hadoop/hbase/regionserver/StoreFile.java src/main/java/org/apache/hadoop/hbase/regionserver/Store.java src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServer.java src/main/java/org/apache/hadoop/hbase/mapreduce/LoadIncrementalHFiles.java src/main/java/org/apache/hadoop/hbase/io/hfile/ChecksumUtil.java src/main/java/org/apache/hadoop/hbase/io/hfile/HFileBlock.java src/main/java/org/apache/hadoop/hbase/io/hfile/HFileDataBlockEncoderImpl.java src/main/java/org/apache/hadoop/hbase/io/hfile/HFileReaderV2.java src/main/java/org/apache/hadoop/hbase/io/hfile/HFile.java src/main/java/org/apache/hadoop/hbase/io/hfile/HFileWriterV2.java src/main/java/org/apache/hadoop/hbase/io/hfile/HFileDataBlockEncoder.java src/main/java/org/apache/hadoop/hbase/io/hfile/AbstractHFileReader.java src/main/java/org/apache/hadoop/hbase/io/hfile/NoOpDataBlockEncoder.java src/main/java/org/apache/hadoop/hbase/io/hfile/HFileReaderV1.java src/main/java/org/apache/hadoop/hbase/io/hfile/FixedFileTrailer.java src/main/java/org/apache/hadoop/hbase/io/hfile/HFileWriterV1.java
          Hide
          Phabricator added a comment -

          dhruba has commented on the revision "[jira] HBASE-5074 Support checksums in HBase block cache".

          INLINE COMMENTS
          src/main/java/org/apache/hadoop/hbase/fs/HFileSystem.java:115 ideally, we need two different fs. The first fs is for writing and reading-with-hdfs-checksums. The other fs is for reading-without-hdfs.

          src/main/java/org/apache/hadoop/hbase/fs/HFileSystem.java:129 done
          src/main/java/org/apache/hadoop/hbase/io/hfile/AbstractHFileReader.java:49 The HFile layer is the one that is responsible for opening a file for reading. Then the multi-threaded HFileBlockLayer uses those FSDataInputStream to pread data from HDFS. So, I need to make the HFile layer open two file descriptors for the same file, both for reading purposes... one which checksum and the other without checksums
          src/main/java/org/apache/hadoop/hbase/io/hfile/AbstractHFileReader.java:44 This is a protected member, so users of this class are not concerned on what this is. If you have a better structure on how to organize this one, please do let me know
          src/main/java/org/apache/hadoop/hbase/io/hfile/ChecksumUtil.java:84 The Checksum API returns a long. But actual implementations like CRC32, CRC32C, etc all return an int.

          Also, the Hadoop checksum implementation also uses a 4 byte value. If you think that we should store 8 byte checksums, I can do that. But for the common case, we will be wasting 4 bytes in the header for every checksum chunk
          src/main/java/org/apache/hadoop/hbase/io/hfile/HFile.java:205 done

          REVISION DETAIL
          https://reviews.facebook.net/D1521

          Show
          Phabricator added a comment - dhruba has commented on the revision " [jira] HBASE-5074 Support checksums in HBase block cache". INLINE COMMENTS src/main/java/org/apache/hadoop/hbase/fs/HFileSystem.java:115 ideally, we need two different fs. The first fs is for writing and reading-with-hdfs-checksums. The other fs is for reading-without-hdfs. src/main/java/org/apache/hadoop/hbase/fs/HFileSystem.java:129 done src/main/java/org/apache/hadoop/hbase/io/hfile/AbstractHFileReader.java:49 The HFile layer is the one that is responsible for opening a file for reading. Then the multi-threaded HFileBlockLayer uses those FSDataInputStream to pread data from HDFS. So, I need to make the HFile layer open two file descriptors for the same file, both for reading purposes... one which checksum and the other without checksums src/main/java/org/apache/hadoop/hbase/io/hfile/AbstractHFileReader.java:44 This is a protected member, so users of this class are not concerned on what this is. If you have a better structure on how to organize this one, please do let me know src/main/java/org/apache/hadoop/hbase/io/hfile/ChecksumUtil.java:84 The Checksum API returns a long. But actual implementations like CRC32, CRC32C, etc all return an int. Also, the Hadoop checksum implementation also uses a 4 byte value. If you think that we should store 8 byte checksums, I can do that. But for the common case, we will be wasting 4 bytes in the header for every checksum chunk src/main/java/org/apache/hadoop/hbase/io/hfile/HFile.java:205 done REVISION DETAIL https://reviews.facebook.net/D1521
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12515551/D1521.7.patch
          against trunk revision .

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 55 new or modified tests.

          -1 javadoc. The javadoc tool appears to have generated -132 warning messages.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          -1 findbugs. The patch appears to introduce 155 new Findbugs (version 1.3.9) warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          -1 core tests. The patch failed these unit tests:
          org.apache.hadoop.hbase.util.TestFSUtils
          org.apache.hadoop.hbase.replication.TestReplication
          org.apache.hadoop.hbase.mapreduce.TestImportTsv
          org.apache.hadoop.hbase.mapred.TestTableMapReduce
          org.apache.hadoop.hbase.mapreduce.TestHFileOutputFormat

          Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/1006//testReport/
          Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/1006//artifact/trunk/patchprocess/newPatchFindbugsWarnings.html
          Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/1006//console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12515551/D1521.7.patch against trunk revision . +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 55 new or modified tests. -1 javadoc. The javadoc tool appears to have generated -132 warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. -1 findbugs. The patch appears to introduce 155 new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed these unit tests: org.apache.hadoop.hbase.util.TestFSUtils org.apache.hadoop.hbase.replication.TestReplication org.apache.hadoop.hbase.mapreduce.TestImportTsv org.apache.hadoop.hbase.mapred.TestTableMapReduce org.apache.hadoop.hbase.mapreduce.TestHFileOutputFormat Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/1006//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/1006//artifact/trunk/patchprocess/newPatchFindbugsWarnings.html Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/1006//console This message is automatically generated.
          Hide
          Phabricator added a comment -

          stack has commented on the revision "[jira] HBASE-5074 Support checksums in HBase block cache".

          Answering Dhruba.

          INLINE COMMENTS
          src/main/java/org/apache/hadoop/hbase/fs/HFileSystem.java:115 Seems like we could have better names for these methods, ones that give more of a clue as to what they are about. getBackingFS, getNoChecksumFS?

          Maybe you are keepign them generic like this because you will be back in this area again soon doing another beautiful speedup on top of this checksumming fix (When we going to do read-ahead? Would that speed scanning?)
          src/main/java/org/apache/hadoop/hbase/io/hfile/AbstractHFileReader.java:44 ok. np.
          src/main/java/org/apache/hadoop/hbase/io/hfile/AbstractHFileReader.java:49 Ok. So, two readers. Our file count is going to go up? We should release note this as side effect of enabling this feature (previous you may have been well below xceivers limit but now you could go over the top?) I didn't notice this was going on. Need to foreground it I'd say.
          src/main/java/org/apache/hadoop/hbase/io/hfile/ChecksumUtil.java:84 I figured. Its fine as is.

          REVISION DETAIL
          https://reviews.facebook.net/D1521

          Show
          Phabricator added a comment - stack has commented on the revision " [jira] HBASE-5074 Support checksums in HBase block cache". Answering Dhruba. INLINE COMMENTS src/main/java/org/apache/hadoop/hbase/fs/HFileSystem.java:115 Seems like we could have better names for these methods, ones that give more of a clue as to what they are about. getBackingFS, getNoChecksumFS? Maybe you are keepign them generic like this because you will be back in this area again soon doing another beautiful speedup on top of this checksumming fix (When we going to do read-ahead? Would that speed scanning?) src/main/java/org/apache/hadoop/hbase/io/hfile/AbstractHFileReader.java:44 ok. np. src/main/java/org/apache/hadoop/hbase/io/hfile/AbstractHFileReader.java:49 Ok. So, two readers. Our file count is going to go up? We should release note this as side effect of enabling this feature (previous you may have been well below xceivers limit but now you could go over the top?) I didn't notice this was going on. Need to foreground it I'd say. src/main/java/org/apache/hadoop/hbase/io/hfile/ChecksumUtil.java:84 I figured. Its fine as is. REVISION DETAIL https://reviews.facebook.net/D1521
          Hide
          Phabricator added a comment -

          stack has commented on the revision "[jira] HBASE-5074 Support checksums in HBase block cache".

          Answering Dhruba.

          INLINE COMMENTS
          src/main/java/org/apache/hadoop/hbase/fs/HFileSystem.java:115 Seems like we could have better names for these methods, ones that give more of a clue as to what they are about. getBackingFS, getNoChecksumFS?

          Maybe you are keepign them generic like this because you will be back in this area again soon doing another beautiful speedup on top of this checksumming fix (When we going to do read-ahead? Would that speed scanning?)
          src/main/java/org/apache/hadoop/hbase/io/hfile/AbstractHFileReader.java:44 ok. np.
          src/main/java/org/apache/hadoop/hbase/io/hfile/AbstractHFileReader.java:49 Ok. So, two readers. Our file count is going to go up? We should release note this as side effect of enabling this feature (previous you may have been well below xceivers limit but now you could go over the top?) I didn't notice this was going on. Need to foreground it I'd say.
          src/main/java/org/apache/hadoop/hbase/io/hfile/ChecksumUtil.java:84 I figured. Its fine as is.

          REVISION DETAIL
          https://reviews.facebook.net/D1521

          Show
          Phabricator added a comment - stack has commented on the revision " [jira] HBASE-5074 Support checksums in HBase block cache". Answering Dhruba. INLINE COMMENTS src/main/java/org/apache/hadoop/hbase/fs/HFileSystem.java:115 Seems like we could have better names for these methods, ones that give more of a clue as to what they are about. getBackingFS, getNoChecksumFS? Maybe you are keepign them generic like this because you will be back in this area again soon doing another beautiful speedup on top of this checksumming fix (When we going to do read-ahead? Would that speed scanning?) src/main/java/org/apache/hadoop/hbase/io/hfile/AbstractHFileReader.java:44 ok. np. src/main/java/org/apache/hadoop/hbase/io/hfile/AbstractHFileReader.java:49 Ok. So, two readers. Our file count is going to go up? We should release note this as side effect of enabling this feature (previous you may have been well below xceivers limit but now you could go over the top?) I didn't notice this was going on. Need to foreground it I'd say. src/main/java/org/apache/hadoop/hbase/io/hfile/ChecksumUtil.java:84 I figured. Its fine as is. REVISION DETAIL https://reviews.facebook.net/D1521
          Hide
          Phabricator added a comment -

          dhruba updated the revision "[jira] HBASE-5074 Support checksums in HBase block cache".
          Reviewers: mbautin

          Changed names of HFileSystem methods/varibales to better reflect
          reality.

          REVISION DETAIL
          https://reviews.facebook.net/D1521

          AFFECTED FILES
          src/test/java/org/apache/hadoop/hbase/HBaseTestingUtility.java
          src/test/java/org/apache/hadoop/hbase/regionserver/CreateRandomStoreFile.java
          src/test/java/org/apache/hadoop/hbase/regionserver/TestFSErrorsExposed.java
          src/test/java/org/apache/hadoop/hbase/regionserver/handler/TestCloseRegionHandler.java
          src/test/java/org/apache/hadoop/hbase/regionserver/HFileReadWriteTest.java
          src/test/java/org/apache/hadoop/hbase/regionserver/TestCompoundBloomFilter.java
          src/test/java/org/apache/hadoop/hbase/regionserver/TestHRegion.java
          src/test/java/org/apache/hadoop/hbase/regionserver/TestStoreFile.java
          src/test/java/org/apache/hadoop/hbase/io/hfile/TestHFileBlock.java
          src/test/java/org/apache/hadoop/hbase/io/hfile/TestChecksum.java
          src/test/java/org/apache/hadoop/hbase/io/hfile/TestCacheOnWrite.java
          src/test/java/org/apache/hadoop/hbase/io/hfile/TestHFileReaderV1.java
          src/test/java/org/apache/hadoop/hbase/io/hfile/TestFixedFileTrailer.java
          src/test/java/org/apache/hadoop/hbase/io/hfile/CacheTestUtils.java
          src/test/java/org/apache/hadoop/hbase/io/hfile/TestHFileBlockIndex.java
          src/test/java/org/apache/hadoop/hbase/io/hfile/TestHFileDataBlockEncoder.java
          src/test/java/org/apache/hadoop/hbase/io/hfile/TestHFileWriterV2.java
          src/test/java/org/apache/hadoop/hbase/io/hfile/TestHFileBlockCompatibility.java
          src/test/java/org/apache/hadoop/hbase/util/MockRegionServerServices.java
          src/main/java/org/apache/hadoop/hbase/HConstants.java
          src/main/java/org/apache/hadoop/hbase/fs
          src/main/java/org/apache/hadoop/hbase/fs/HFileSystem.java
          src/main/java/org/apache/hadoop/hbase/util/ChecksumType.java
          src/main/java/org/apache/hadoop/hbase/util/CompoundBloomFilter.java
          src/main/java/org/apache/hadoop/hbase/util/ChecksumFactory.java
          src/main/java/org/apache/hadoop/hbase/regionserver/RegionServerServices.java
          src/main/java/org/apache/hadoop/hbase/regionserver/metrics/RegionServerMetrics.java
          src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java
          src/main/java/org/apache/hadoop/hbase/regionserver/StoreFile.java
          src/main/java/org/apache/hadoop/hbase/regionserver/Store.java
          src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServer.java
          src/main/java/org/apache/hadoop/hbase/mapreduce/LoadIncrementalHFiles.java
          src/main/java/org/apache/hadoop/hbase/io/hfile/ChecksumUtil.java
          src/main/java/org/apache/hadoop/hbase/io/hfile/HFileBlock.java
          src/main/java/org/apache/hadoop/hbase/io/hfile/HFileDataBlockEncoderImpl.java
          src/main/java/org/apache/hadoop/hbase/io/hfile/HFileReaderV2.java
          src/main/java/org/apache/hadoop/hbase/io/hfile/HFile.java
          src/main/java/org/apache/hadoop/hbase/io/hfile/HFileWriterV2.java
          src/main/java/org/apache/hadoop/hbase/io/hfile/HFileDataBlockEncoder.java
          src/main/java/org/apache/hadoop/hbase/io/hfile/AbstractHFileReader.java
          src/main/java/org/apache/hadoop/hbase/io/hfile/NoOpDataBlockEncoder.java
          src/main/java/org/apache/hadoop/hbase/io/hfile/HFileReaderV1.java
          src/main/java/org/apache/hadoop/hbase/io/hfile/FixedFileTrailer.java
          src/main/java/org/apache/hadoop/hbase/io/hfile/HFileWriterV1.java

          Show
          Phabricator added a comment - dhruba updated the revision " [jira] HBASE-5074 Support checksums in HBase block cache". Reviewers: mbautin Changed names of HFileSystem methods/varibales to better reflect reality. REVISION DETAIL https://reviews.facebook.net/D1521 AFFECTED FILES src/test/java/org/apache/hadoop/hbase/HBaseTestingUtility.java src/test/java/org/apache/hadoop/hbase/regionserver/CreateRandomStoreFile.java src/test/java/org/apache/hadoop/hbase/regionserver/TestFSErrorsExposed.java src/test/java/org/apache/hadoop/hbase/regionserver/handler/TestCloseRegionHandler.java src/test/java/org/apache/hadoop/hbase/regionserver/HFileReadWriteTest.java src/test/java/org/apache/hadoop/hbase/regionserver/TestCompoundBloomFilter.java src/test/java/org/apache/hadoop/hbase/regionserver/TestHRegion.java src/test/java/org/apache/hadoop/hbase/regionserver/TestStoreFile.java src/test/java/org/apache/hadoop/hbase/io/hfile/TestHFileBlock.java src/test/java/org/apache/hadoop/hbase/io/hfile/TestChecksum.java src/test/java/org/apache/hadoop/hbase/io/hfile/TestCacheOnWrite.java src/test/java/org/apache/hadoop/hbase/io/hfile/TestHFileReaderV1.java src/test/java/org/apache/hadoop/hbase/io/hfile/TestFixedFileTrailer.java src/test/java/org/apache/hadoop/hbase/io/hfile/CacheTestUtils.java src/test/java/org/apache/hadoop/hbase/io/hfile/TestHFileBlockIndex.java src/test/java/org/apache/hadoop/hbase/io/hfile/TestHFileDataBlockEncoder.java src/test/java/org/apache/hadoop/hbase/io/hfile/TestHFileWriterV2.java src/test/java/org/apache/hadoop/hbase/io/hfile/TestHFileBlockCompatibility.java src/test/java/org/apache/hadoop/hbase/util/MockRegionServerServices.java src/main/java/org/apache/hadoop/hbase/HConstants.java src/main/java/org/apache/hadoop/hbase/fs src/main/java/org/apache/hadoop/hbase/fs/HFileSystem.java src/main/java/org/apache/hadoop/hbase/util/ChecksumType.java src/main/java/org/apache/hadoop/hbase/util/CompoundBloomFilter.java src/main/java/org/apache/hadoop/hbase/util/ChecksumFactory.java src/main/java/org/apache/hadoop/hbase/regionserver/RegionServerServices.java src/main/java/org/apache/hadoop/hbase/regionserver/metrics/RegionServerMetrics.java src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java src/main/java/org/apache/hadoop/hbase/regionserver/StoreFile.java src/main/java/org/apache/hadoop/hbase/regionserver/Store.java src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServer.java src/main/java/org/apache/hadoop/hbase/mapreduce/LoadIncrementalHFiles.java src/main/java/org/apache/hadoop/hbase/io/hfile/ChecksumUtil.java src/main/java/org/apache/hadoop/hbase/io/hfile/HFileBlock.java src/main/java/org/apache/hadoop/hbase/io/hfile/HFileDataBlockEncoderImpl.java src/main/java/org/apache/hadoop/hbase/io/hfile/HFileReaderV2.java src/main/java/org/apache/hadoop/hbase/io/hfile/HFile.java src/main/java/org/apache/hadoop/hbase/io/hfile/HFileWriterV2.java src/main/java/org/apache/hadoop/hbase/io/hfile/HFileDataBlockEncoder.java src/main/java/org/apache/hadoop/hbase/io/hfile/AbstractHFileReader.java src/main/java/org/apache/hadoop/hbase/io/hfile/NoOpDataBlockEncoder.java src/main/java/org/apache/hadoop/hbase/io/hfile/HFileReaderV1.java src/main/java/org/apache/hadoop/hbase/io/hfile/FixedFileTrailer.java src/main/java/org/apache/hadoop/hbase/io/hfile/HFileWriterV1.java
          Hide
          Phabricator added a comment -

          dhruba updated the revision "[jira] HBASE-5074 Support checksums in HBase block cache".
          Reviewers: mbautin

          Changed names of HFileSystem methods/varibales to better reflect
          reality.

          REVISION DETAIL
          https://reviews.facebook.net/D1521

          AFFECTED FILES
          src/test/java/org/apache/hadoop/hbase/HBaseTestingUtility.java
          src/test/java/org/apache/hadoop/hbase/regionserver/CreateRandomStoreFile.java
          src/test/java/org/apache/hadoop/hbase/regionserver/TestFSErrorsExposed.java
          src/test/java/org/apache/hadoop/hbase/regionserver/handler/TestCloseRegionHandler.java
          src/test/java/org/apache/hadoop/hbase/regionserver/HFileReadWriteTest.java
          src/test/java/org/apache/hadoop/hbase/regionserver/TestCompoundBloomFilter.java
          src/test/java/org/apache/hadoop/hbase/regionserver/TestHRegion.java
          src/test/java/org/apache/hadoop/hbase/regionserver/TestStoreFile.java
          src/test/java/org/apache/hadoop/hbase/io/hfile/TestHFileBlock.java
          src/test/java/org/apache/hadoop/hbase/io/hfile/TestChecksum.java
          src/test/java/org/apache/hadoop/hbase/io/hfile/TestCacheOnWrite.java
          src/test/java/org/apache/hadoop/hbase/io/hfile/TestHFileReaderV1.java
          src/test/java/org/apache/hadoop/hbase/io/hfile/TestFixedFileTrailer.java
          src/test/java/org/apache/hadoop/hbase/io/hfile/CacheTestUtils.java
          src/test/java/org/apache/hadoop/hbase/io/hfile/TestHFileBlockIndex.java
          src/test/java/org/apache/hadoop/hbase/io/hfile/TestHFileDataBlockEncoder.java
          src/test/java/org/apache/hadoop/hbase/io/hfile/TestHFileWriterV2.java
          src/test/java/org/apache/hadoop/hbase/io/hfile/TestHFileBlockCompatibility.java
          src/test/java/org/apache/hadoop/hbase/util/MockRegionServerServices.java
          src/main/java/org/apache/hadoop/hbase/HConstants.java
          src/main/java/org/apache/hadoop/hbase/fs
          src/main/java/org/apache/hadoop/hbase/fs/HFileSystem.java
          src/main/java/org/apache/hadoop/hbase/util/ChecksumType.java
          src/main/java/org/apache/hadoop/hbase/util/CompoundBloomFilter.java
          src/main/java/org/apache/hadoop/hbase/util/ChecksumFactory.java
          src/main/java/org/apache/hadoop/hbase/regionserver/RegionServerServices.java
          src/main/java/org/apache/hadoop/hbase/regionserver/metrics/RegionServerMetrics.java
          src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java
          src/main/java/org/apache/hadoop/hbase/regionserver/StoreFile.java
          src/main/java/org/apache/hadoop/hbase/regionserver/Store.java
          src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServer.java
          src/main/java/org/apache/hadoop/hbase/mapreduce/LoadIncrementalHFiles.java
          src/main/java/org/apache/hadoop/hbase/io/hfile/ChecksumUtil.java
          src/main/java/org/apache/hadoop/hbase/io/hfile/HFileBlock.java
          src/main/java/org/apache/hadoop/hbase/io/hfile/HFileDataBlockEncoderImpl.java
          src/main/java/org/apache/hadoop/hbase/io/hfile/HFileReaderV2.java
          src/main/java/org/apache/hadoop/hbase/io/hfile/HFile.java
          src/main/java/org/apache/hadoop/hbase/io/hfile/HFileWriterV2.java
          src/main/java/org/apache/hadoop/hbase/io/hfile/HFileDataBlockEncoder.java
          src/main/java/org/apache/hadoop/hbase/io/hfile/AbstractHFileReader.java
          src/main/java/org/apache/hadoop/hbase/io/hfile/NoOpDataBlockEncoder.java
          src/main/java/org/apache/hadoop/hbase/io/hfile/HFileReaderV1.java
          src/main/java/org/apache/hadoop/hbase/io/hfile/FixedFileTrailer.java
          src/main/java/org/apache/hadoop/hbase/io/hfile/HFileWriterV1.java

          Show
          Phabricator added a comment - dhruba updated the revision " [jira] HBASE-5074 Support checksums in HBase block cache". Reviewers: mbautin Changed names of HFileSystem methods/varibales to better reflect reality. REVISION DETAIL https://reviews.facebook.net/D1521 AFFECTED FILES src/test/java/org/apache/hadoop/hbase/HBaseTestingUtility.java src/test/java/org/apache/hadoop/hbase/regionserver/CreateRandomStoreFile.java src/test/java/org/apache/hadoop/hbase/regionserver/TestFSErrorsExposed.java src/test/java/org/apache/hadoop/hbase/regionserver/handler/TestCloseRegionHandler.java src/test/java/org/apache/hadoop/hbase/regionserver/HFileReadWriteTest.java src/test/java/org/apache/hadoop/hbase/regionserver/TestCompoundBloomFilter.java src/test/java/org/apache/hadoop/hbase/regionserver/TestHRegion.java src/test/java/org/apache/hadoop/hbase/regionserver/TestStoreFile.java src/test/java/org/apache/hadoop/hbase/io/hfile/TestHFileBlock.java src/test/java/org/apache/hadoop/hbase/io/hfile/TestChecksum.java src/test/java/org/apache/hadoop/hbase/io/hfile/TestCacheOnWrite.java src/test/java/org/apache/hadoop/hbase/io/hfile/TestHFileReaderV1.java src/test/java/org/apache/hadoop/hbase/io/hfile/TestFixedFileTrailer.java src/test/java/org/apache/hadoop/hbase/io/hfile/CacheTestUtils.java src/test/java/org/apache/hadoop/hbase/io/hfile/TestHFileBlockIndex.java src/test/java/org/apache/hadoop/hbase/io/hfile/TestHFileDataBlockEncoder.java src/test/java/org/apache/hadoop/hbase/io/hfile/TestHFileWriterV2.java src/test/java/org/apache/hadoop/hbase/io/hfile/TestHFileBlockCompatibility.java src/test/java/org/apache/hadoop/hbase/util/MockRegionServerServices.java src/main/java/org/apache/hadoop/hbase/HConstants.java src/main/java/org/apache/hadoop/hbase/fs src/main/java/org/apache/hadoop/hbase/fs/HFileSystem.java src/main/java/org/apache/hadoop/hbase/util/ChecksumType.java src/main/java/org/apache/hadoop/hbase/util/CompoundBloomFilter.java src/main/java/org/apache/hadoop/hbase/util/ChecksumFactory.java src/main/java/org/apache/hadoop/hbase/regionserver/RegionServerServices.java src/main/java/org/apache/hadoop/hbase/regionserver/metrics/RegionServerMetrics.java src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java src/main/java/org/apache/hadoop/hbase/regionserver/StoreFile.java src/main/java/org/apache/hadoop/hbase/regionserver/Store.java src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServer.java src/main/java/org/apache/hadoop/hbase/mapreduce/LoadIncrementalHFiles.java src/main/java/org/apache/hadoop/hbase/io/hfile/ChecksumUtil.java src/main/java/org/apache/hadoop/hbase/io/hfile/HFileBlock.java src/main/java/org/apache/hadoop/hbase/io/hfile/HFileDataBlockEncoderImpl.java src/main/java/org/apache/hadoop/hbase/io/hfile/HFileReaderV2.java src/main/java/org/apache/hadoop/hbase/io/hfile/HFile.java src/main/java/org/apache/hadoop/hbase/io/hfile/HFileWriterV2.java src/main/java/org/apache/hadoop/hbase/io/hfile/HFileDataBlockEncoder.java src/main/java/org/apache/hadoop/hbase/io/hfile/AbstractHFileReader.java src/main/java/org/apache/hadoop/hbase/io/hfile/NoOpDataBlockEncoder.java src/main/java/org/apache/hadoop/hbase/io/hfile/HFileReaderV1.java src/main/java/org/apache/hadoop/hbase/io/hfile/FixedFileTrailer.java src/main/java/org/apache/hadoop/hbase/io/hfile/HFileWriterV1.java
          Hide
          Phabricator added a comment -

          dhruba has commented on the revision "[jira] HBASE-5074 Support checksums in HBase block cache".

          > Ok. So, two readers. Our file count is going to go up?

          The file count should not go up. We still do the same number of ios to hdfs, so the number of concurrent IOs on a datanode should still be the same, so the number of xceivers on the datanode should not be adversely affected by this patch. Please let me know if I am missing something here.

          REVISION DETAIL
          https://reviews.facebook.net/D1521

          Show
          Phabricator added a comment - dhruba has commented on the revision " [jira] HBASE-5074 Support checksums in HBase block cache". > Ok. So, two readers. Our file count is going to go up? The file count should not go up. We still do the same number of ios to hdfs, so the number of concurrent IOs on a datanode should still be the same, so the number of xceivers on the datanode should not be adversely affected by this patch. Please let me know if I am missing something here. REVISION DETAIL https://reviews.facebook.net/D1521
          Hide
          Phabricator added a comment -

          dhruba has commented on the revision "[jira] HBASE-5074 Support checksums in HBase block cache".

          > Ok. So, two readers. Our file count is going to go up?

          The file count should not go up. We still do the same number of ios to hdfs, so the number of concurrent IOs on a datanode should still be the same, so the number of xceivers on the datanode should not be adversely affected by this patch. Please let me know if I am missing something here.

          REVISION DETAIL
          https://reviews.facebook.net/D1521

          Show
          Phabricator added a comment - dhruba has commented on the revision " [jira] HBASE-5074 Support checksums in HBase block cache". > Ok. So, two readers. Our file count is going to go up? The file count should not go up. We still do the same number of ios to hdfs, so the number of concurrent IOs on a datanode should still be the same, so the number of xceivers on the datanode should not be adversely affected by this patch. Please let me know if I am missing something here. REVISION DETAIL https://reviews.facebook.net/D1521
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12515642/D1521.8.patch
          against trunk revision .

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 55 new or modified tests.

          -1 patch. The patch command could not apply the patch.

          Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/1014//console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12515642/D1521.8.patch against trunk revision . +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 55 new or modified tests. -1 patch. The patch command could not apply the patch. Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/1014//console This message is automatically generated.
          Hide
          Phabricator added a comment -

          stack has commented on the revision "[jira] HBASE-5074 Support checksums in HBase block cache".

          INLINE COMMENTS
          src/main/java/org/apache/hadoop/hbase/fs/HFileSystem.java:115 Please ignore my previous comment on renaming these methods. On reread, I think they are plenty clear enough as they are.
          src/main/java/org/apache/hadoop/hbase/fs/HFileSystem.java:120 Nit: Change this to be an @return javadoc so its clear we are returning current state of this flag?
          src/main/java/org/apache/hadoop/hbase/fs/HFileSystem.java:164 Does mean that this feature is on by default? Should we read configuration to figure whether its on or not?
          src/main/java/org/apache/hadoop/hbase/io/hfile/ChecksumUtil.java:73 Is this threadsafe? This looks like a shared object?

          REVISION DETAIL
          https://reviews.facebook.net/D1521

          Show
          Phabricator added a comment - stack has commented on the revision " [jira] HBASE-5074 Support checksums in HBase block cache". INLINE COMMENTS src/main/java/org/apache/hadoop/hbase/fs/HFileSystem.java:115 Please ignore my previous comment on renaming these methods. On reread, I think they are plenty clear enough as they are. src/main/java/org/apache/hadoop/hbase/fs/HFileSystem.java:120 Nit: Change this to be an @return javadoc so its clear we are returning current state of this flag? src/main/java/org/apache/hadoop/hbase/fs/HFileSystem.java:164 Does mean that this feature is on by default? Should we read configuration to figure whether its on or not? src/main/java/org/apache/hadoop/hbase/io/hfile/ChecksumUtil.java:73 Is this threadsafe? This looks like a shared object? REVISION DETAIL https://reviews.facebook.net/D1521
          Hide
          Phabricator added a comment -

          stack has commented on the revision "[jira] HBASE-5074 Support checksums in HBase block cache".

          INLINE COMMENTS
          src/main/java/org/apache/hadoop/hbase/fs/HFileSystem.java:115 Please ignore my previous comment on renaming these methods. On reread, I think they are plenty clear enough as they are.
          src/main/java/org/apache/hadoop/hbase/fs/HFileSystem.java:120 Nit: Change this to be an @return javadoc so its clear we are returning current state of this flag?
          src/main/java/org/apache/hadoop/hbase/fs/HFileSystem.java:164 Does mean that this feature is on by default? Should we read configuration to figure whether its on or not?
          src/main/java/org/apache/hadoop/hbase/io/hfile/ChecksumUtil.java:73 Is this threadsafe? This looks like a shared object?

          REVISION DETAIL
          https://reviews.facebook.net/D1521

          Show
          Phabricator added a comment - stack has commented on the revision " [jira] HBASE-5074 Support checksums in HBase block cache". INLINE COMMENTS src/main/java/org/apache/hadoop/hbase/fs/HFileSystem.java:115 Please ignore my previous comment on renaming these methods. On reread, I think they are plenty clear enough as they are. src/main/java/org/apache/hadoop/hbase/fs/HFileSystem.java:120 Nit: Change this to be an @return javadoc so its clear we are returning current state of this flag? src/main/java/org/apache/hadoop/hbase/fs/HFileSystem.java:164 Does mean that this feature is on by default? Should we read configuration to figure whether its on or not? src/main/java/org/apache/hadoop/hbase/io/hfile/ChecksumUtil.java:73 Is this threadsafe? This looks like a shared object? REVISION DETAIL https://reviews.facebook.net/D1521
          Hide
          Phabricator added a comment -

          stack has commented on the revision "[jira] HBASE-5074 Support checksums in HBase block cache".

          Dhruba, have you been running this patch anywhere?

          I'm +1 on commit if tests pass. If its not been run anywhere, i can test it local before committing.

          INLINE COMMENTS
          src/main/java/org/apache/hadoop/hbase/io/hfile/HFileBlock.java:235 Is it odd that we only take in the minor version here and not major too?
          src/main/java/org/apache/hadoop/hbase/io/hfile/HFileBlock.java:861 Why WARN? This is a 'normal' operation?
          src/main/java/org/apache/hadoop/hbase/io/hfile/HFileBlock.java:1235 So, yeah, aren't we doubling the FDs when we do this? The iops may be the same but the threads floating in the datanode for reading will double?
          src/main/java/org/apache/hadoop/hbase/io/hfile/HFileBlock.java:1244 I'm not getting why no major version in here.
          src/main/java/org/apache/hadoop/hbase/io/hfile/HFileBlock.java:1584 So, again we are defaulting true (though it seems that if no checksums in hfiles, we'll flip this flag to off pretty immediately)
          src/main/java/org/apache/hadoop/hbase/io/hfile/HFileBlock.java:1589 Smile. Like now.
          src/main/java/org/apache/hadoop/hbase/io/hfile/HFileBlock.java:1630 Extreme nit: Should we close the nochecksumistream if its not going to be used?

          Hmm... now I see we can flip back to using them again later in the stream
          src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServer.java:961 Now we have our own filesystem, we can dump a bunch of crud in there ! We can add things like the hbase.version check, etc. (joke – sortof).
          src/main/java/org/apache/hadoop/hbase/regionserver/RegionServerServices.java:86 I'm reluctant adding stuff to this Interface but I think this method qualifies as important enough to be allowed in.
          src/main/java/org/apache/hadoop/hbase/util/ChecksumType.java:70 Great

          REVISION DETAIL
          https://reviews.facebook.net/D1521

          Show
          Phabricator added a comment - stack has commented on the revision " [jira] HBASE-5074 Support checksums in HBase block cache". Dhruba, have you been running this patch anywhere? I'm +1 on commit if tests pass. If its not been run anywhere, i can test it local before committing. INLINE COMMENTS src/main/java/org/apache/hadoop/hbase/io/hfile/HFileBlock.java:235 Is it odd that we only take in the minor version here and not major too? src/main/java/org/apache/hadoop/hbase/io/hfile/HFileBlock.java:861 Why WARN? This is a 'normal' operation? src/main/java/org/apache/hadoop/hbase/io/hfile/HFileBlock.java:1235 So, yeah, aren't we doubling the FDs when we do this? The iops may be the same but the threads floating in the datanode for reading will double? src/main/java/org/apache/hadoop/hbase/io/hfile/HFileBlock.java:1244 I'm not getting why no major version in here. src/main/java/org/apache/hadoop/hbase/io/hfile/HFileBlock.java:1584 So, again we are defaulting true (though it seems that if no checksums in hfiles, we'll flip this flag to off pretty immediately) src/main/java/org/apache/hadoop/hbase/io/hfile/HFileBlock.java:1589 Smile. Like now. src/main/java/org/apache/hadoop/hbase/io/hfile/HFileBlock.java:1630 Extreme nit: Should we close the nochecksumistream if its not going to be used? Hmm... now I see we can flip back to using them again later in the stream src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServer.java:961 Now we have our own filesystem, we can dump a bunch of crud in there ! We can add things like the hbase.version check, etc. (joke – sortof). src/main/java/org/apache/hadoop/hbase/regionserver/RegionServerServices.java:86 I'm reluctant adding stuff to this Interface but I think this method qualifies as important enough to be allowed in. src/main/java/org/apache/hadoop/hbase/util/ChecksumType.java:70 Great REVISION DETAIL https://reviews.facebook.net/D1521
          Hide
          Phabricator added a comment -

          stack has commented on the revision "[jira] HBASE-5074 Support checksums in HBase block cache".

          Dhruba, have you been running this patch anywhere?

          I'm +1 on commit if tests pass. If its not been run anywhere, i can test it local before committing.

          INLINE COMMENTS
          src/main/java/org/apache/hadoop/hbase/io/hfile/HFileBlock.java:235 Is it odd that we only take in the minor version here and not major too?
          src/main/java/org/apache/hadoop/hbase/io/hfile/HFileBlock.java:861 Why WARN? This is a 'normal' operation?
          src/main/java/org/apache/hadoop/hbase/io/hfile/HFileBlock.java:1235 So, yeah, aren't we doubling the FDs when we do this? The iops may be the same but the threads floating in the datanode for reading will double?
          src/main/java/org/apache/hadoop/hbase/io/hfile/HFileBlock.java:1244 I'm not getting why no major version in here.
          src/main/java/org/apache/hadoop/hbase/io/hfile/HFileBlock.java:1584 So, again we are defaulting true (though it seems that if no checksums in hfiles, we'll flip this flag to off pretty immediately)
          src/main/java/org/apache/hadoop/hbase/io/hfile/HFileBlock.java:1589 Smile. Like now.
          src/main/java/org/apache/hadoop/hbase/io/hfile/HFileBlock.java:1630 Extreme nit: Should we close the nochecksumistream if its not going to be used?

          Hmm... now I see we can flip back to using them again later in the stream
          src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServer.java:961 Now we have our own filesystem, we can dump a bunch of crud in there ! We can add things like the hbase.version check, etc. (joke – sortof).
          src/main/java/org/apache/hadoop/hbase/regionserver/RegionServerServices.java:86 I'm reluctant adding stuff to this Interface but I think this method qualifies as important enough to be allowed in.
          src/main/java/org/apache/hadoop/hbase/util/ChecksumType.java:70 Great

          REVISION DETAIL
          https://reviews.facebook.net/D1521

          Show
          Phabricator added a comment - stack has commented on the revision " [jira] HBASE-5074 Support checksums in HBase block cache". Dhruba, have you been running this patch anywhere? I'm +1 on commit if tests pass. If its not been run anywhere, i can test it local before committing. INLINE COMMENTS src/main/java/org/apache/hadoop/hbase/io/hfile/HFileBlock.java:235 Is it odd that we only take in the minor version here and not major too? src/main/java/org/apache/hadoop/hbase/io/hfile/HFileBlock.java:861 Why WARN? This is a 'normal' operation? src/main/java/org/apache/hadoop/hbase/io/hfile/HFileBlock.java:1235 So, yeah, aren't we doubling the FDs when we do this? The iops may be the same but the threads floating in the datanode for reading will double? src/main/java/org/apache/hadoop/hbase/io/hfile/HFileBlock.java:1244 I'm not getting why no major version in here. src/main/java/org/apache/hadoop/hbase/io/hfile/HFileBlock.java:1584 So, again we are defaulting true (though it seems that if no checksums in hfiles, we'll flip this flag to off pretty immediately) src/main/java/org/apache/hadoop/hbase/io/hfile/HFileBlock.java:1589 Smile. Like now. src/main/java/org/apache/hadoop/hbase/io/hfile/HFileBlock.java:1630 Extreme nit: Should we close the nochecksumistream if its not going to be used? Hmm... now I see we can flip back to using them again later in the stream src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServer.java:961 Now we have our own filesystem, we can dump a bunch of crud in there ! We can add things like the hbase.version check, etc. (joke – sortof). src/main/java/org/apache/hadoop/hbase/regionserver/RegionServerServices.java:86 I'm reluctant adding stuff to this Interface but I think this method qualifies as important enough to be allowed in. src/main/java/org/apache/hadoop/hbase/util/ChecksumType.java:70 Great REVISION DETAIL https://reviews.facebook.net/D1521
          Hide
          Phabricator added a comment -

          tedyu has commented on the revision "[jira] HBASE-5074 Support checksums in HBase block cache".

          INLINE COMMENTS
          src/main/java/org/apache/hadoop/hbase/io/hfile/HFileBlock.java:1646 Since doVerify is an internal boolean variable, we should give it better name.
          How about 'doVerificationThruHBaseChecksum' ?

          REVISION DETAIL
          https://reviews.facebook.net/D1521

          Show
          Phabricator added a comment - tedyu has commented on the revision " [jira] HBASE-5074 Support checksums in HBase block cache". INLINE COMMENTS src/main/java/org/apache/hadoop/hbase/io/hfile/HFileBlock.java:1646 Since doVerify is an internal boolean variable, we should give it better name. How about 'doVerificationThruHBaseChecksum' ? REVISION DETAIL https://reviews.facebook.net/D1521
          Hide
          Phabricator added a comment -

          tedyu has commented on the revision "[jira] HBASE-5074 Support checksums in HBase block cache".

          INLINE COMMENTS
          src/main/java/org/apache/hadoop/hbase/io/hfile/HFileBlock.java:1646 Since doVerify is an internal boolean variable, we should give it better name.
          How about 'doVerificationThruHBaseChecksum' ?

          REVISION DETAIL
          https://reviews.facebook.net/D1521

          Show
          Phabricator added a comment - tedyu has commented on the revision " [jira] HBASE-5074 Support checksums in HBase block cache". INLINE COMMENTS src/main/java/org/apache/hadoop/hbase/io/hfile/HFileBlock.java:1646 Since doVerify is an internal boolean variable, we should give it better name. How about 'doVerificationThruHBaseChecksum' ? REVISION DETAIL https://reviews.facebook.net/D1521
          Hide
          dhruba borthakur added a comment -

          @Stack: I am running it on a very small cluster, but will deploy it on a larger cluster next week. Please hold off committing this one till my larger-cluster-tests pass.

          I will also address Stack's and Ted's review comments in the next version of my patch

          Show
          dhruba borthakur added a comment - @Stack: I am running it on a very small cluster, but will deploy it on a larger cluster next week. Please hold off committing this one till my larger-cluster-tests pass. I will also address Stack's and Ted's review comments in the next version of my patch
          Hide
          Phabricator added a comment -

          dhruba has commented on the revision "[jira] HBASE-5074 Support checksums in HBase block cache".

          INLINE COMMENTS
          src/main/java/org/apache/hadoop/hbase/io/hfile/ChecksumUtil.java:73 Actually, a new checksum object is created by every invocation of ChecksumType.getChecksumObject(), so it should be thread-safe
          src/main/java/org/apache/hadoop/hbase/fs/HFileSystem.java:120 doing it

          src/main/java/org/apache/hadoop/hbase/fs/HFileSystem.java:164 will restructure the comment, this feature is switched on by default.

          REVISION DETAIL
          https://reviews.facebook.net/D1521

          Show
          Phabricator added a comment - dhruba has commented on the revision " [jira] HBASE-5074 Support checksums in HBase block cache". INLINE COMMENTS src/main/java/org/apache/hadoop/hbase/io/hfile/ChecksumUtil.java:73 Actually, a new checksum object is created by every invocation of ChecksumType.getChecksumObject(), so it should be thread-safe src/main/java/org/apache/hadoop/hbase/fs/HFileSystem.java:120 doing it src/main/java/org/apache/hadoop/hbase/fs/HFileSystem.java:164 will restructure the comment, this feature is switched on by default. REVISION DETAIL https://reviews.facebook.net/D1521
          Hide
          Phabricator added a comment -

          dhruba has commented on the revision "[jira] HBASE-5074 Support checksums in HBase block cache".

          INLINE COMMENTS
          src/main/java/org/apache/hadoop/hbase/io/hfile/ChecksumUtil.java:73 Actually, a new checksum object is created by every invocation of ChecksumType.getChecksumObject(), so it should be thread-safe
          src/main/java/org/apache/hadoop/hbase/fs/HFileSystem.java:120 doing it

          src/main/java/org/apache/hadoop/hbase/fs/HFileSystem.java:164 will restructure the comment, this feature is switched on by default.

          REVISION DETAIL
          https://reviews.facebook.net/D1521

          Show
          Phabricator added a comment - dhruba has commented on the revision " [jira] HBASE-5074 Support checksums in HBase block cache". INLINE COMMENTS src/main/java/org/apache/hadoop/hbase/io/hfile/ChecksumUtil.java:73 Actually, a new checksum object is created by every invocation of ChecksumType.getChecksumObject(), so it should be thread-safe src/main/java/org/apache/hadoop/hbase/fs/HFileSystem.java:120 doing it src/main/java/org/apache/hadoop/hbase/fs/HFileSystem.java:164 will restructure the comment, this feature is switched on by default. REVISION DETAIL https://reviews.facebook.net/D1521
          Hide
          Phabricator added a comment -

          dhruba has commented on the revision "[jira] HBASE-5074 Support checksums in HBase block cache".

          INLINE COMMENTS
          src/main/java/org/apache/hadoop/hbase/io/hfile/HFileBlock.java:235 This constructor is used only for V2, hence the major number is not a parameter.
          src/main/java/org/apache/hadoop/hbase/io/hfile/HFileBlock.java:1235 I think there won;t be any changes to the number of threads in the datanode. A datanode thread is not tied up with a client FileSystem object. Instead, a global pool of threads in the datanode are free to serve any read-requests from any client
          src/main/java/org/apache/hadoop/hbase/io/hfile/HFileBlock.java:1244 The minor version indicates disk-format changes inside an HFileBlock. The major version indicates disk-format changes within a entire HFile. Since the AbstractFSReader only reads HFileBlocks, so it is logical that it contains the minorVersion, is it not?

          But I can put in the majorVersion in it as well, if you so desire.
          src/main/java/org/apache/hadoop/hbase/io/hfile/HFileBlock.java:1584 Yes, the default it to enable hbase-checksum verification. And you are right that if the hfile is of the older type, then we will quickly flip this back to false (in the next line)
          src/main/java/org/apache/hadoop/hbase/io/hfile/HFileBlock.java:1630 I think we should keep both streams active till the HFile itself is closed.
          src/main/java/org/apache/hadoop/hbase/io/hfile/HFileBlock.java:1646 done

          src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServer.java:961 Yes, precisely. Going forward, I would like to see if we can make HLogs go to a filesystem object that is different from the filesystem used for hfiles.
          src/main/java/org/apache/hadoop/hbase/regionserver/RegionServerServices.java:86 I agree with you completely. This is an interface that should not change often.

          REVISION DETAIL
          https://reviews.facebook.net/D1521

          Show
          Phabricator added a comment - dhruba has commented on the revision " [jira] HBASE-5074 Support checksums in HBase block cache". INLINE COMMENTS src/main/java/org/apache/hadoop/hbase/io/hfile/HFileBlock.java:235 This constructor is used only for V2, hence the major number is not a parameter. src/main/java/org/apache/hadoop/hbase/io/hfile/HFileBlock.java:1235 I think there won;t be any changes to the number of threads in the datanode. A datanode thread is not tied up with a client FileSystem object. Instead, a global pool of threads in the datanode are free to serve any read-requests from any client src/main/java/org/apache/hadoop/hbase/io/hfile/HFileBlock.java:1244 The minor version indicates disk-format changes inside an HFileBlock. The major version indicates disk-format changes within a entire HFile. Since the AbstractFSReader only reads HFileBlocks, so it is logical that it contains the minorVersion, is it not? But I can put in the majorVersion in it as well, if you so desire. src/main/java/org/apache/hadoop/hbase/io/hfile/HFileBlock.java:1584 Yes, the default it to enable hbase-checksum verification. And you are right that if the hfile is of the older type, then we will quickly flip this back to false (in the next line) src/main/java/org/apache/hadoop/hbase/io/hfile/HFileBlock.java:1630 I think we should keep both streams active till the HFile itself is closed. src/main/java/org/apache/hadoop/hbase/io/hfile/HFileBlock.java:1646 done src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServer.java:961 Yes, precisely. Going forward, I would like to see if we can make HLogs go to a filesystem object that is different from the filesystem used for hfiles. src/main/java/org/apache/hadoop/hbase/regionserver/RegionServerServices.java:86 I agree with you completely. This is an interface that should not change often. REVISION DETAIL https://reviews.facebook.net/D1521
          Hide
          Phabricator added a comment -

          dhruba has commented on the revision "[jira] HBASE-5074 Support checksums in HBase block cache".

          INLINE COMMENTS
          src/main/java/org/apache/hadoop/hbase/io/hfile/HFileBlock.java:235 This constructor is used only for V2, hence the major number is not a parameter.
          src/main/java/org/apache/hadoop/hbase/io/hfile/HFileBlock.java:1235 I think there won;t be any changes to the number of threads in the datanode. A datanode thread is not tied up with a client FileSystem object. Instead, a global pool of threads in the datanode are free to serve any read-requests from any client
          src/main/java/org/apache/hadoop/hbase/io/hfile/HFileBlock.java:1244 The minor version indicates disk-format changes inside an HFileBlock. The major version indicates disk-format changes within a entire HFile. Since the AbstractFSReader only reads HFileBlocks, so it is logical that it contains the minorVersion, is it not?

          But I can put in the majorVersion in it as well, if you so desire.
          src/main/java/org/apache/hadoop/hbase/io/hfile/HFileBlock.java:1584 Yes, the default it to enable hbase-checksum verification. And you are right that if the hfile is of the older type, then we will quickly flip this back to false (in the next line)
          src/main/java/org/apache/hadoop/hbase/io/hfile/HFileBlock.java:1630 I think we should keep both streams active till the HFile itself is closed.
          src/main/java/org/apache/hadoop/hbase/io/hfile/HFileBlock.java:1646 done

          src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServer.java:961 Yes, precisely. Going forward, I would like to see if we can make HLogs go to a filesystem object that is different from the filesystem used for hfiles.
          src/main/java/org/apache/hadoop/hbase/regionserver/RegionServerServices.java:86 I agree with you completely. This is an interface that should not change often.

          REVISION DETAIL
          https://reviews.facebook.net/D1521

          Show
          Phabricator added a comment - dhruba has commented on the revision " [jira] HBASE-5074 Support checksums in HBase block cache". INLINE COMMENTS src/main/java/org/apache/hadoop/hbase/io/hfile/HFileBlock.java:235 This constructor is used only for V2, hence the major number is not a parameter. src/main/java/org/apache/hadoop/hbase/io/hfile/HFileBlock.java:1235 I think there won;t be any changes to the number of threads in the datanode. A datanode thread is not tied up with a client FileSystem object. Instead, a global pool of threads in the datanode are free to serve any read-requests from any client src/main/java/org/apache/hadoop/hbase/io/hfile/HFileBlock.java:1244 The minor version indicates disk-format changes inside an HFileBlock. The major version indicates disk-format changes within a entire HFile. Since the AbstractFSReader only reads HFileBlocks, so it is logical that it contains the minorVersion, is it not? But I can put in the majorVersion in it as well, if you so desire. src/main/java/org/apache/hadoop/hbase/io/hfile/HFileBlock.java:1584 Yes, the default it to enable hbase-checksum verification. And you are right that if the hfile is of the older type, then we will quickly flip this back to false (in the next line) src/main/java/org/apache/hadoop/hbase/io/hfile/HFileBlock.java:1630 I think we should keep both streams active till the HFile itself is closed. src/main/java/org/apache/hadoop/hbase/io/hfile/HFileBlock.java:1646 done src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServer.java:961 Yes, precisely. Going forward, I would like to see if we can make HLogs go to a filesystem object that is different from the filesystem used for hfiles. src/main/java/org/apache/hadoop/hbase/regionserver/RegionServerServices.java:86 I agree with you completely. This is an interface that should not change often. REVISION DETAIL https://reviews.facebook.net/D1521
          Hide
          Phabricator added a comment -

          dhruba updated the revision "[jira] HBASE-5074 Support checksums in HBase block cache".
          Reviewers: mbautin

          Pulled in review comments from Stack and Ted.

          REVISION DETAIL
          https://reviews.facebook.net/D1521

          AFFECTED FILES
          src/test/java/org/apache/hadoop/hbase/HBaseTestingUtility.java
          src/test/java/org/apache/hadoop/hbase/regionserver/CreateRandomStoreFile.java
          src/test/java/org/apache/hadoop/hbase/regionserver/handler/TestCloseRegionHandler.java
          src/test/java/org/apache/hadoop/hbase/regionserver/TestFSErrorsExposed.java
          src/test/java/org/apache/hadoop/hbase/regionserver/HFileReadWriteTest.java
          src/test/java/org/apache/hadoop/hbase/regionserver/TestCompoundBloomFilter.java
          src/test/java/org/apache/hadoop/hbase/regionserver/TestHRegion.java
          src/test/java/org/apache/hadoop/hbase/regionserver/TestStoreFile.java
          src/test/java/org/apache/hadoop/hbase/io/hfile/TestChecksum.java
          src/test/java/org/apache/hadoop/hbase/io/hfile/TestHFileBlock.java
          src/test/java/org/apache/hadoop/hbase/io/hfile/TestCacheOnWrite.java
          src/test/java/org/apache/hadoop/hbase/io/hfile/TestHFileReaderV1.java
          src/test/java/org/apache/hadoop/hbase/io/hfile/TestFixedFileTrailer.java
          src/test/java/org/apache/hadoop/hbase/io/hfile/CacheTestUtils.java
          src/test/java/org/apache/hadoop/hbase/io/hfile/TestHFileBlockIndex.java
          src/test/java/org/apache/hadoop/hbase/io/hfile/TestHFileDataBlockEncoder.java
          src/test/java/org/apache/hadoop/hbase/io/hfile/TestHFileWriterV2.java
          src/test/java/org/apache/hadoop/hbase/io/hfile/TestHFileBlockCompatibility.java
          src/test/java/org/apache/hadoop/hbase/util/MockRegionServerServices.java
          src/main/java/org/apache/hadoop/hbase/HConstants.java
          src/main/java/org/apache/hadoop/hbase/fs
          src/main/java/org/apache/hadoop/hbase/fs/HFileSystem.java
          src/main/java/org/apache/hadoop/hbase/util/ChecksumType.java
          src/main/java/org/apache/hadoop/hbase/util/CompoundBloomFilter.java
          src/main/java/org/apache/hadoop/hbase/util/ChecksumFactory.java
          src/main/java/org/apache/hadoop/hbase/regionserver/RegionServerServices.java
          src/main/java/org/apache/hadoop/hbase/regionserver/metrics/RegionServerMetrics.java
          src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java
          src/main/java/org/apache/hadoop/hbase/regionserver/Store.java
          src/main/java/org/apache/hadoop/hbase/regionserver/StoreFile.java
          src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServer.java
          src/main/java/org/apache/hadoop/hbase/mapreduce/LoadIncrementalHFiles.java
          src/main/java/org/apache/hadoop/hbase/io/hfile/ChecksumUtil.java
          src/main/java/org/apache/hadoop/hbase/io/hfile/HFileBlock.java
          src/main/java/org/apache/hadoop/hbase/io/hfile/HFileDataBlockEncoderImpl.java
          src/main/java/org/apache/hadoop/hbase/io/hfile/HFileReaderV2.java
          src/main/java/org/apache/hadoop/hbase/io/hfile/HFile.java
          src/main/java/org/apache/hadoop/hbase/io/hfile/HFileWriterV2.java
          src/main/java/org/apache/hadoop/hbase/io/hfile/HFileDataBlockEncoder.java
          src/main/java/org/apache/hadoop/hbase/io/hfile/AbstractHFileReader.java
          src/main/java/org/apache/hadoop/hbase/io/hfile/NoOpDataBlockEncoder.java
          src/main/java/org/apache/hadoop/hbase/io/hfile/HFileReaderV1.java
          src/main/java/org/apache/hadoop/hbase/io/hfile/FixedFileTrailer.java
          src/main/java/org/apache/hadoop/hbase/io/hfile/HFileWriterV1.java

          Show
          Phabricator added a comment - dhruba updated the revision " [jira] HBASE-5074 Support checksums in HBase block cache". Reviewers: mbautin Pulled in review comments from Stack and Ted. REVISION DETAIL https://reviews.facebook.net/D1521 AFFECTED FILES src/test/java/org/apache/hadoop/hbase/HBaseTestingUtility.java src/test/java/org/apache/hadoop/hbase/regionserver/CreateRandomStoreFile.java src/test/java/org/apache/hadoop/hbase/regionserver/handler/TestCloseRegionHandler.java src/test/java/org/apache/hadoop/hbase/regionserver/TestFSErrorsExposed.java src/test/java/org/apache/hadoop/hbase/regionserver/HFileReadWriteTest.java src/test/java/org/apache/hadoop/hbase/regionserver/TestCompoundBloomFilter.java src/test/java/org/apache/hadoop/hbase/regionserver/TestHRegion.java src/test/java/org/apache/hadoop/hbase/regionserver/TestStoreFile.java src/test/java/org/apache/hadoop/hbase/io/hfile/TestChecksum.java src/test/java/org/apache/hadoop/hbase/io/hfile/TestHFileBlock.java src/test/java/org/apache/hadoop/hbase/io/hfile/TestCacheOnWrite.java src/test/java/org/apache/hadoop/hbase/io/hfile/TestHFileReaderV1.java src/test/java/org/apache/hadoop/hbase/io/hfile/TestFixedFileTrailer.java src/test/java/org/apache/hadoop/hbase/io/hfile/CacheTestUtils.java src/test/java/org/apache/hadoop/hbase/io/hfile/TestHFileBlockIndex.java src/test/java/org/apache/hadoop/hbase/io/hfile/TestHFileDataBlockEncoder.java src/test/java/org/apache/hadoop/hbase/io/hfile/TestHFileWriterV2.java src/test/java/org/apache/hadoop/hbase/io/hfile/TestHFileBlockCompatibility.java src/test/java/org/apache/hadoop/hbase/util/MockRegionServerServices.java src/main/java/org/apache/hadoop/hbase/HConstants.java src/main/java/org/apache/hadoop/hbase/fs src/main/java/org/apache/hadoop/hbase/fs/HFileSystem.java src/main/java/org/apache/hadoop/hbase/util/ChecksumType.java src/main/java/org/apache/hadoop/hbase/util/CompoundBloomFilter.java src/main/java/org/apache/hadoop/hbase/util/ChecksumFactory.java src/main/java/org/apache/hadoop/hbase/regionserver/RegionServerServices.java src/main/java/org/apache/hadoop/hbase/regionserver/metrics/RegionServerMetrics.java src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java src/main/java/org/apache/hadoop/hbase/regionserver/Store.java src/main/java/org/apache/hadoop/hbase/regionserver/StoreFile.java src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServer.java src/main/java/org/apache/hadoop/hbase/mapreduce/LoadIncrementalHFiles.java src/main/java/org/apache/hadoop/hbase/io/hfile/ChecksumUtil.java src/main/java/org/apache/hadoop/hbase/io/hfile/HFileBlock.java src/main/java/org/apache/hadoop/hbase/io/hfile/HFileDataBlockEncoderImpl.java src/main/java/org/apache/hadoop/hbase/io/hfile/HFileReaderV2.java src/main/java/org/apache/hadoop/hbase/io/hfile/HFile.java src/main/java/org/apache/hadoop/hbase/io/hfile/HFileWriterV2.java src/main/java/org/apache/hadoop/hbase/io/hfile/HFileDataBlockEncoder.java src/main/java/org/apache/hadoop/hbase/io/hfile/AbstractHFileReader.java src/main/java/org/apache/hadoop/hbase/io/hfile/NoOpDataBlockEncoder.java src/main/java/org/apache/hadoop/hbase/io/hfile/HFileReaderV1.java src/main/java/org/apache/hadoop/hbase/io/hfile/FixedFileTrailer.java src/main/java/org/apache/hadoop/hbase/io/hfile/HFileWriterV1.java
          Hide
          Phabricator added a comment -

          dhruba updated the revision "[jira] HBASE-5074 Support checksums in HBase block cache".
          Reviewers: mbautin

          Pulled in review comments from Stack and Ted.

          REVISION DETAIL
          https://reviews.facebook.net/D1521

          AFFECTED FILES
          src/test/java/org/apache/hadoop/hbase/HBaseTestingUtility.java
          src/test/java/org/apache/hadoop/hbase/regionserver/CreateRandomStoreFile.java
          src/test/java/org/apache/hadoop/hbase/regionserver/handler/TestCloseRegionHandler.java
          src/test/java/org/apache/hadoop/hbase/regionserver/TestFSErrorsExposed.java
          src/test/java/org/apache/hadoop/hbase/regionserver/HFileReadWriteTest.java
          src/test/java/org/apache/hadoop/hbase/regionserver/TestCompoundBloomFilter.java
          src/test/java/org/apache/hadoop/hbase/regionserver/TestHRegion.java
          src/test/java/org/apache/hadoop/hbase/regionserver/TestStoreFile.java
          src/test/java/org/apache/hadoop/hbase/io/hfile/TestChecksum.java
          src/test/java/org/apache/hadoop/hbase/io/hfile/TestHFileBlock.java
          src/test/java/org/apache/hadoop/hbase/io/hfile/TestCacheOnWrite.java
          src/test/java/org/apache/hadoop/hbase/io/hfile/TestHFileReaderV1.java
          src/test/java/org/apache/hadoop/hbase/io/hfile/TestFixedFileTrailer.java
          src/test/java/org/apache/hadoop/hbase/io/hfile/CacheTestUtils.java
          src/test/java/org/apache/hadoop/hbase/io/hfile/TestHFileBlockIndex.java
          src/test/java/org/apache/hadoop/hbase/io/hfile/TestHFileDataBlockEncoder.java
          src/test/java/org/apache/hadoop/hbase/io/hfile/TestHFileWriterV2.java
          src/test/java/org/apache/hadoop/hbase/io/hfile/TestHFileBlockCompatibility.java
          src/test/java/org/apache/hadoop/hbase/util/MockRegionServerServices.java
          src/main/java/org/apache/hadoop/hbase/HConstants.java
          src/main/java/org/apache/hadoop/hbase/fs
          src/main/java/org/apache/hadoop/hbase/fs/HFileSystem.java
          src/main/java/org/apache/hadoop/hbase/util/ChecksumType.java
          src/main/java/org/apache/hadoop/hbase/util/CompoundBloomFilter.java
          src/main/java/org/apache/hadoop/hbase/util/ChecksumFactory.java
          src/main/java/org/apache/hadoop/hbase/regionserver/RegionServerServices.java
          src/main/java/org/apache/hadoop/hbase/regionserver/metrics/RegionServerMetrics.java
          src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java
          src/main/java/org/apache/hadoop/hbase/regionserver/Store.java
          src/main/java/org/apache/hadoop/hbase/regionserver/StoreFile.java
          src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServer.java
          src/main/java/org/apache/hadoop/hbase/mapreduce/LoadIncrementalHFiles.java
          src/main/java/org/apache/hadoop/hbase/io/hfile/ChecksumUtil.java
          src/main/java/org/apache/hadoop/hbase/io/hfile/HFileBlock.java
          src/main/java/org/apache/hadoop/hbase/io/hfile/HFileDataBlockEncoderImpl.java
          src/main/java/org/apache/hadoop/hbase/io/hfile/HFileReaderV2.java
          src/main/java/org/apache/hadoop/hbase/io/hfile/HFile.java
          src/main/java/org/apache/hadoop/hbase/io/hfile/HFileWriterV2.java
          src/main/java/org/apache/hadoop/hbase/io/hfile/HFileDataBlockEncoder.java
          src/main/java/org/apache/hadoop/hbase/io/hfile/AbstractHFileReader.java
          src/main/java/org/apache/hadoop/hbase/io/hfile/NoOpDataBlockEncoder.java
          src/main/java/org/apache/hadoop/hbase/io/hfile/HFileReaderV1.java
          src/main/java/org/apache/hadoop/hbase/io/hfile/FixedFileTrailer.java
          src/main/java/org/apache/hadoop/hbase/io/hfile/HFileWriterV1.java

          Show
          Phabricator added a comment - dhruba updated the revision " [jira] HBASE-5074 Support checksums in HBase block cache". Reviewers: mbautin Pulled in review comments from Stack and Ted. REVISION DETAIL https://reviews.facebook.net/D1521 AFFECTED FILES src/test/java/org/apache/hadoop/hbase/HBaseTestingUtility.java src/test/java/org/apache/hadoop/hbase/regionserver/CreateRandomStoreFile.java src/test/java/org/apache/hadoop/hbase/regionserver/handler/TestCloseRegionHandler.java src/test/java/org/apache/hadoop/hbase/regionserver/TestFSErrorsExposed.java src/test/java/org/apache/hadoop/hbase/regionserver/HFileReadWriteTest.java src/test/java/org/apache/hadoop/hbase/regionserver/TestCompoundBloomFilter.java src/test/java/org/apache/hadoop/hbase/regionserver/TestHRegion.java src/test/java/org/apache/hadoop/hbase/regionserver/TestStoreFile.java src/test/java/org/apache/hadoop/hbase/io/hfile/TestChecksum.java src/test/java/org/apache/hadoop/hbase/io/hfile/TestHFileBlock.java src/test/java/org/apache/hadoop/hbase/io/hfile/TestCacheOnWrite.java src/test/java/org/apache/hadoop/hbase/io/hfile/TestHFileReaderV1.java src/test/java/org/apache/hadoop/hbase/io/hfile/TestFixedFileTrailer.java src/test/java/org/apache/hadoop/hbase/io/hfile/CacheTestUtils.java src/test/java/org/apache/hadoop/hbase/io/hfile/TestHFileBlockIndex.java src/test/java/org/apache/hadoop/hbase/io/hfile/TestHFileDataBlockEncoder.java src/test/java/org/apache/hadoop/hbase/io/hfile/TestHFileWriterV2.java src/test/java/org/apache/hadoop/hbase/io/hfile/TestHFileBlockCompatibility.java src/test/java/org/apache/hadoop/hbase/util/MockRegionServerServices.java src/main/java/org/apache/hadoop/hbase/HConstants.java src/main/java/org/apache/hadoop/hbase/fs src/main/java/org/apache/hadoop/hbase/fs/HFileSystem.java src/main/java/org/apache/hadoop/hbase/util/ChecksumType.java src/main/java/org/apache/hadoop/hbase/util/CompoundBloomFilter.java src/main/java/org/apache/hadoop/hbase/util/ChecksumFactory.java src/main/java/org/apache/hadoop/hbase/regionserver/RegionServerServices.java src/main/java/org/apache/hadoop/hbase/regionserver/metrics/RegionServerMetrics.java src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java src/main/java/org/apache/hadoop/hbase/regionserver/Store.java src/main/java/org/apache/hadoop/hbase/regionserver/StoreFile.java src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServer.java src/main/java/org/apache/hadoop/hbase/mapreduce/LoadIncrementalHFiles.java src/main/java/org/apache/hadoop/hbase/io/hfile/ChecksumUtil.java src/main/java/org/apache/hadoop/hbase/io/hfile/HFileBlock.java src/main/java/org/apache/hadoop/hbase/io/hfile/HFileDataBlockEncoderImpl.java src/main/java/org/apache/hadoop/hbase/io/hfile/HFileReaderV2.java src/main/java/org/apache/hadoop/hbase/io/hfile/HFile.java src/main/java/org/apache/hadoop/hbase/io/hfile/HFileWriterV2.java src/main/java/org/apache/hadoop/hbase/io/hfile/HFileDataBlockEncoder.java src/main/java/org/apache/hadoop/hbase/io/hfile/AbstractHFileReader.java src/main/java/org/apache/hadoop/hbase/io/hfile/NoOpDataBlockEncoder.java src/main/java/org/apache/hadoop/hbase/io/hfile/HFileReaderV1.java src/main/java/org/apache/hadoop/hbase/io/hfile/FixedFileTrailer.java src/main/java/org/apache/hadoop/hbase/io/hfile/HFileWriterV1.java
          Hide
          Phabricator added a comment -

          mbautin has commented on the revision "[jira] HBASE-5074 Support checksums in HBase block cache".

          @dhruba: going through the diff once again. Since you've updated the revision, submitting existing comments against the previous version, and continuing with the new version.

          INLINE COMMENTS
          src/main/java/org/apache/hadoop/hbase/io/hfile/ChecksumUtil.java:131 Misspelling: "Minimun" -> Minimum
          src/main/java/org/apache/hadoop/hbase/fs/HFileSystem.java:44-45 Can these two be made final too?
          src/main/java/org/apache/hadoop/hbase/io/hfile/ChecksumUtil.java:145 s/chuck/chunk/
          src/main/java/org/apache/hadoop/hbase/io/hfile/AbstractHFileReader.java:48 Fix javadoc: do do -> do
          src/main/java/org/apache/hadoop/hbase/io/hfile/ChecksumUtil.java:38 Make this final, rename to DUMMY_VALUE, because this is a constant, and make the length a factor of 16 to take advantage of alignment.
          src/main/java/org/apache/hadoop/hbase/io/hfile/FixedFileTrailer.java:532 s/manor/major/
          src/main/java/org/apache/hadoop/hbase/io/hfile/HFile.java:157 This comment is misleading. This is not something that defaults to the 16 K, but the default value itself. I think this should say something about how a non-default value is specified.
          src/main/java/org/apache/hadoop/hbase/io/hfile/HFile.java:265-271 The additional constructor should not be needed when https://issues.apache.org/jira/browse/HBASE-5442 goes in.
          src/main/java/org/apache/hadoop/hbase/io/hfile/HFile.java:409 Is it possible to obtain the filesystem from the input stream rather than pass it as an additional parameter? Or is the underlying filesystem of the input stream a regular one, as opposed to an HFileSystem?

          REVISION DETAIL
          https://reviews.facebook.net/D1521

          Show
          Phabricator added a comment - mbautin has commented on the revision " [jira] HBASE-5074 Support checksums in HBase block cache". @dhruba: going through the diff once again. Since you've updated the revision, submitting existing comments against the previous version, and continuing with the new version. INLINE COMMENTS src/main/java/org/apache/hadoop/hbase/io/hfile/ChecksumUtil.java:131 Misspelling: "Minimun" -> Minimum src/main/java/org/apache/hadoop/hbase/fs/HFileSystem.java:44-45 Can these two be made final too? src/main/java/org/apache/hadoop/hbase/io/hfile/ChecksumUtil.java:145 s/chuck/chunk/ src/main/java/org/apache/hadoop/hbase/io/hfile/AbstractHFileReader.java:48 Fix javadoc: do do -> do src/main/java/org/apache/hadoop/hbase/io/hfile/ChecksumUtil.java:38 Make this final, rename to DUMMY_VALUE, because this is a constant, and make the length a factor of 16 to take advantage of alignment. src/main/java/org/apache/hadoop/hbase/io/hfile/FixedFileTrailer.java:532 s/manor/major/ src/main/java/org/apache/hadoop/hbase/io/hfile/HFile.java:157 This comment is misleading. This is not something that defaults to the 16 K, but the default value itself. I think this should say something about how a non-default value is specified. src/main/java/org/apache/hadoop/hbase/io/hfile/HFile.java:265-271 The additional constructor should not be needed when https://issues.apache.org/jira/browse/HBASE-5442 goes in. src/main/java/org/apache/hadoop/hbase/io/hfile/HFile.java:409 Is it possible to obtain the filesystem from the input stream rather than pass it as an additional parameter? Or is the underlying filesystem of the input stream a regular one, as opposed to an HFileSystem? REVISION DETAIL https://reviews.facebook.net/D1521
          Hide
          Phabricator added a comment -

          mbautin has commented on the revision "[jira] HBASE-5074 Support checksums in HBase block cache".

          @dhruba: going through the diff once again. Since you've updated the revision, submitting existing comments against the previous version, and continuing with the new version.

          INLINE COMMENTS
          src/main/java/org/apache/hadoop/hbase/io/hfile/ChecksumUtil.java:131 Misspelling: "Minimun" -> Minimum
          src/main/java/org/apache/hadoop/hbase/fs/HFileSystem.java:44-45 Can these two be made final too?
          src/main/java/org/apache/hadoop/hbase/io/hfile/ChecksumUtil.java:145 s/chuck/chunk/
          src/main/java/org/apache/hadoop/hbase/io/hfile/AbstractHFileReader.java:48 Fix javadoc: do do -> do
          src/main/java/org/apache/hadoop/hbase/io/hfile/ChecksumUtil.java:38 Make this final, rename to DUMMY_VALUE, because this is a constant, and make the length a factor of 16 to take advantage of alignment.
          src/main/java/org/apache/hadoop/hbase/io/hfile/FixedFileTrailer.java:532 s/manor/major/
          src/main/java/org/apache/hadoop/hbase/io/hfile/HFile.java:157 This comment is misleading. This is not something that defaults to the 16 K, but the default value itself. I think this should say something about how a non-default value is specified.
          src/main/java/org/apache/hadoop/hbase/io/hfile/HFile.java:265-271 The additional constructor should not be needed when https://issues.apache.org/jira/browse/HBASE-5442 goes in.
          src/main/java/org/apache/hadoop/hbase/io/hfile/HFile.java:409 Is it possible to obtain the filesystem from the input stream rather than pass it as an additional parameter? Or is the underlying filesystem of the input stream a regular one, as opposed to an HFileSystem?

          REVISION DETAIL
          https://reviews.facebook.net/D1521

          Show
          Phabricator added a comment - mbautin has commented on the revision " [jira] HBASE-5074 Support checksums in HBase block cache". @dhruba: going through the diff once again. Since you've updated the revision, submitting existing comments against the previous version, and continuing with the new version. INLINE COMMENTS src/main/java/org/apache/hadoop/hbase/io/hfile/ChecksumUtil.java:131 Misspelling: "Minimun" -> Minimum src/main/java/org/apache/hadoop/hbase/fs/HFileSystem.java:44-45 Can these two be made final too? src/main/java/org/apache/hadoop/hbase/io/hfile/ChecksumUtil.java:145 s/chuck/chunk/ src/main/java/org/apache/hadoop/hbase/io/hfile/AbstractHFileReader.java:48 Fix javadoc: do do -> do src/main/java/org/apache/hadoop/hbase/io/hfile/ChecksumUtil.java:38 Make this final, rename to DUMMY_VALUE, because this is a constant, and make the length a factor of 16 to take advantage of alignment. src/main/java/org/apache/hadoop/hbase/io/hfile/FixedFileTrailer.java:532 s/manor/major/ src/main/java/org/apache/hadoop/hbase/io/hfile/HFile.java:157 This comment is misleading. This is not something that defaults to the 16 K, but the default value itself. I think this should say something about how a non-default value is specified. src/main/java/org/apache/hadoop/hbase/io/hfile/HFile.java:265-271 The additional constructor should not be needed when https://issues.apache.org/jira/browse/HBASE-5442 goes in. src/main/java/org/apache/hadoop/hbase/io/hfile/HFile.java:409 Is it possible to obtain the filesystem from the input stream rather than pass it as an additional parameter? Or is the underlying filesystem of the input stream a regular one, as opposed to an HFileSystem? REVISION DETAIL https://reviews.facebook.net/D1521
          Hide
          Phabricator added a comment -

          tedyu has commented on the revision "[jira] HBASE-5074 Support checksums in HBase block cache".

          INLINE COMMENTS
          src/main/java/org/apache/hadoop/hbase/io/hfile/HFileBlock.java:235 What will happen after HFileV3 is introduced ?
          I would expect HFileV3 starts with minorVersion of 0.
          src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServer.java:961 HLog goes to fs on SSD ?
          Nice.

          REVISION DETAIL
          https://reviews.facebook.net/D1521

          Show
          Phabricator added a comment - tedyu has commented on the revision " [jira] HBASE-5074 Support checksums in HBase block cache". INLINE COMMENTS src/main/java/org/apache/hadoop/hbase/io/hfile/HFileBlock.java:235 What will happen after HFileV3 is introduced ? I would expect HFileV3 starts with minorVersion of 0. src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServer.java:961 HLog goes to fs on SSD ? Nice. REVISION DETAIL https://reviews.facebook.net/D1521
          Hide
          Phabricator added a comment -

          tedyu has commented on the revision "[jira] HBASE-5074 Support checksums in HBase block cache".

          INLINE COMMENTS
          src/main/java/org/apache/hadoop/hbase/io/hfile/HFileBlock.java:235 What will happen after HFileV3 is introduced ?
          I would expect HFileV3 starts with minorVersion of 0.
          src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServer.java:961 HLog goes to fs on SSD ?
          Nice.

          REVISION DETAIL
          https://reviews.facebook.net/D1521

          Show
          Phabricator added a comment - tedyu has commented on the revision " [jira] HBASE-5074 Support checksums in HBase block cache". INLINE COMMENTS src/main/java/org/apache/hadoop/hbase/io/hfile/HFileBlock.java:235 What will happen after HFileV3 is introduced ? I would expect HFileV3 starts with minorVersion of 0. src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServer.java:961 HLog goes to fs on SSD ? Nice. REVISION DETAIL https://reviews.facebook.net/D1521
          Hide
          Phabricator added a comment -

          stack has commented on the revision "[jira] HBASE-5074 Support checksums in HBase block cache".

          Good w/ your comebacks Dhruba... just minor one below for your next rev.

          Let us know how the cluster testing goes. This patch applies fine. Might try it out over here too..

          INLINE COMMENTS
          src/main/java/org/apache/hadoop/hbase/io/hfile/HFileBlock.java:235 I don't understand. I think this means the fact that we have a minor version unaccompanied by a major needs docing here in a comment? No hurry.

          REVISION DETAIL
          https://reviews.facebook.net/D1521

          Show
          Phabricator added a comment - stack has commented on the revision " [jira] HBASE-5074 Support checksums in HBase block cache". Good w/ your comebacks Dhruba... just minor one below for your next rev. Let us know how the cluster testing goes. This patch applies fine. Might try it out over here too.. INLINE COMMENTS src/main/java/org/apache/hadoop/hbase/io/hfile/HFileBlock.java:235 I don't understand. I think this means the fact that we have a minor version unaccompanied by a major needs docing here in a comment? No hurry. REVISION DETAIL https://reviews.facebook.net/D1521
          Hide
          Phabricator added a comment -

          stack has commented on the revision "[jira] HBASE-5074 Support checksums in HBase block cache".

          Good w/ your comebacks Dhruba... just minor one below for your next rev.

          Let us know how the cluster testing goes. This patch applies fine. Might try it out over here too..

          INLINE COMMENTS
          src/main/java/org/apache/hadoop/hbase/io/hfile/HFileBlock.java:235 I don't understand. I think this means the fact that we have a minor version unaccompanied by a major needs docing here in a comment? No hurry.

          REVISION DETAIL
          https://reviews.facebook.net/D1521

          Show
          Phabricator added a comment - stack has commented on the revision " [jira] HBASE-5074 Support checksums in HBase block cache". Good w/ your comebacks Dhruba... just minor one below for your next rev. Let us know how the cluster testing goes. This patch applies fine. Might try it out over here too.. INLINE COMMENTS src/main/java/org/apache/hadoop/hbase/io/hfile/HFileBlock.java:235 I don't understand. I think this means the fact that we have a minor version unaccompanied by a major needs docing here in a comment? No hurry. REVISION DETAIL https://reviews.facebook.net/D1521
          Hide
          Phabricator added a comment -

          mbautin has commented on the revision "[jira] HBASE-5074 Support checksums in HBase block cache".

          @dhruba: some more comments inline.

          INLINE COMMENTS
          src/main/java/org/apache/hadoop/hbase/io/hfile/HFileBlock.java:451-452 Assign headerSize() to a local variable instead of calling it twice.
          src/main/java/org/apache/hadoop/hbase/io/hfile/HFileBlock.java:529-530 Call headerSize() once and store in a local variable.
          src/main/java/org/apache/hadoop/hbase/io/hfile/HFileBlock.java:1232 do do -> do
          src/main/java/org/apache/hadoop/hbase/io/hfile/HFileBlock.java:1642-1644 Store and reuse part of the previous error message.
          src/main/java/org/apache/hadoop/hbase/io/hfile/HFileBlock.java:1636 Check if WARN level messages are enabled and only generate the message string in that case.
          src/main/java/org/apache/hadoop/hbase/io/hfile/HFileBlock.java:1848 double semicolon (does not matter)
          src/main/java/org/apache/hadoop/hbase/io/hfile/HFileReaderV2.java:424 What if istream != istreamNoFsChecksum but istreamNoFsChecksum == null?
          src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java:3610-3612 Not sure how this is related to HBase-level checksum checking
          src/main/java/org/apache/hadoop/hbase/regionserver/Store.java:265 Make this conf key a constant in HConstants
          src/main/java/org/apache/hadoop/hbase/regionserver/Store.java:275 conf key -> HConstants
          src/main/java/org/apache/hadoop/hbase/util/ChecksumType.java:40-43 This is unnecessary because the default toString would do the same.
          src/main/java/org/apache/hadoop/hbase/util/ChecksumType.java:57-60 This is unnecessary because the default toString would do the same.
          src/main/java/org/apache/hadoop/hbase/util/ChecksumType.java:103-106 This is unnecessary because the default toString would do the same.
          src/main/java/org/apache/hadoop/hbase/util/ChecksumType.java:143-144 It looks like toString would to this.
          src/main/java/org/apache/hadoop/hbase/util/ChecksumType.java:179 Would not the built-in enum method valueOf do what this function is doing?
          src/test/java/org/apache/hadoop/hbase/io/hfile/TestHFileBlockCompatibility.java:1 This file still seems to contain a lot of copy-and-paste from TestHFileBlock. Are you planning to address that?

          REVISION DETAIL
          https://reviews.facebook.net/D1521

          Show
          Phabricator added a comment - mbautin has commented on the revision " [jira] HBASE-5074 Support checksums in HBase block cache". @dhruba: some more comments inline. INLINE COMMENTS src/main/java/org/apache/hadoop/hbase/io/hfile/HFileBlock.java:451-452 Assign headerSize() to a local variable instead of calling it twice. src/main/java/org/apache/hadoop/hbase/io/hfile/HFileBlock.java:529-530 Call headerSize() once and store in a local variable. src/main/java/org/apache/hadoop/hbase/io/hfile/HFileBlock.java:1232 do do -> do src/main/java/org/apache/hadoop/hbase/io/hfile/HFileBlock.java:1642-1644 Store and reuse part of the previous error message. src/main/java/org/apache/hadoop/hbase/io/hfile/HFileBlock.java:1636 Check if WARN level messages are enabled and only generate the message string in that case. src/main/java/org/apache/hadoop/hbase/io/hfile/HFileBlock.java:1848 double semicolon (does not matter) src/main/java/org/apache/hadoop/hbase/io/hfile/HFileReaderV2.java:424 What if istream != istreamNoFsChecksum but istreamNoFsChecksum == null? src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java:3610-3612 Not sure how this is related to HBase-level checksum checking src/main/java/org/apache/hadoop/hbase/regionserver/Store.java:265 Make this conf key a constant in HConstants src/main/java/org/apache/hadoop/hbase/regionserver/Store.java:275 conf key -> HConstants src/main/java/org/apache/hadoop/hbase/util/ChecksumType.java:40-43 This is unnecessary because the default toString would do the same. src/main/java/org/apache/hadoop/hbase/util/ChecksumType.java:57-60 This is unnecessary because the default toString would do the same. src/main/java/org/apache/hadoop/hbase/util/ChecksumType.java:103-106 This is unnecessary because the default toString would do the same. src/main/java/org/apache/hadoop/hbase/util/ChecksumType.java:143-144 It looks like toString would to this. src/main/java/org/apache/hadoop/hbase/util/ChecksumType.java:179 Would not the built-in enum method valueOf do what this function is doing? src/test/java/org/apache/hadoop/hbase/io/hfile/TestHFileBlockCompatibility.java:1 This file still seems to contain a lot of copy-and-paste from TestHFileBlock. Are you planning to address that? REVISION DETAIL https://reviews.facebook.net/D1521
          Hide
          Phabricator added a comment -

          mbautin has commented on the revision "[jira] HBASE-5074 Support checksums in HBase block cache".

          @dhruba: some more comments inline.

          INLINE COMMENTS
          src/main/java/org/apache/hadoop/hbase/io/hfile/HFileBlock.java:451-452 Assign headerSize() to a local variable instead of calling it twice.
          src/main/java/org/apache/hadoop/hbase/io/hfile/HFileBlock.java:529-530 Call headerSize() once and store in a local variable.
          src/main/java/org/apache/hadoop/hbase/io/hfile/HFileBlock.java:1232 do do -> do
          src/main/java/org/apache/hadoop/hbase/io/hfile/HFileBlock.java:1642-1644 Store and reuse part of the previous error message.
          src/main/java/org/apache/hadoop/hbase/io/hfile/HFileBlock.java:1636 Check if WARN level messages are enabled and only generate the message string in that case.
          src/main/java/org/apache/hadoop/hbase/io/hfile/HFileBlock.java:1848 double semicolon (does not matter)
          src/main/java/org/apache/hadoop/hbase/io/hfile/HFileReaderV2.java:424 What if istream != istreamNoFsChecksum but istreamNoFsChecksum == null?
          src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java:3610-3612 Not sure how this is related to HBase-level checksum checking
          src/main/java/org/apache/hadoop/hbase/regionserver/Store.java:265 Make this conf key a constant in HConstants
          src/main/java/org/apache/hadoop/hbase/regionserver/Store.java:275 conf key -> HConstants
          src/main/java/org/apache/hadoop/hbase/util/ChecksumType.java:40-43 This is unnecessary because the default toString would do the same.
          src/main/java/org/apache/hadoop/hbase/util/ChecksumType.java:57-60 This is unnecessary because the default toString would do the same.
          src/main/java/org/apache/hadoop/hbase/util/ChecksumType.java:103-106 This is unnecessary because the default toString would do the same.
          src/main/java/org/apache/hadoop/hbase/util/ChecksumType.java:143-144 It looks like toString would to this.
          src/main/java/org/apache/hadoop/hbase/util/ChecksumType.java:179 Would not the built-in enum method valueOf do what this function is doing?
          src/test/java/org/apache/hadoop/hbase/io/hfile/TestHFileBlockCompatibility.java:1 This file still seems to contain a lot of copy-and-paste from TestHFileBlock. Are you planning to address that?

          REVISION DETAIL
          https://reviews.facebook.net/D1521

          Show
          Phabricator added a comment - mbautin has commented on the revision " [jira] HBASE-5074 Support checksums in HBase block cache". @dhruba: some more comments inline. INLINE COMMENTS src/main/java/org/apache/hadoop/hbase/io/hfile/HFileBlock.java:451-452 Assign headerSize() to a local variable instead of calling it twice. src/main/java/org/apache/hadoop/hbase/io/hfile/HFileBlock.java:529-530 Call headerSize() once and store in a local variable. src/main/java/org/apache/hadoop/hbase/io/hfile/HFileBlock.java:1232 do do -> do src/main/java/org/apache/hadoop/hbase/io/hfile/HFileBlock.java:1642-1644 Store and reuse part of the previous error message. src/main/java/org/apache/hadoop/hbase/io/hfile/HFileBlock.java:1636 Check if WARN level messages are enabled and only generate the message string in that case. src/main/java/org/apache/hadoop/hbase/io/hfile/HFileBlock.java:1848 double semicolon (does not matter) src/main/java/org/apache/hadoop/hbase/io/hfile/HFileReaderV2.java:424 What if istream != istreamNoFsChecksum but istreamNoFsChecksum == null? src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java:3610-3612 Not sure how this is related to HBase-level checksum checking src/main/java/org/apache/hadoop/hbase/regionserver/Store.java:265 Make this conf key a constant in HConstants src/main/java/org/apache/hadoop/hbase/regionserver/Store.java:275 conf key -> HConstants src/main/java/org/apache/hadoop/hbase/util/ChecksumType.java:40-43 This is unnecessary because the default toString would do the same. src/main/java/org/apache/hadoop/hbase/util/ChecksumType.java:57-60 This is unnecessary because the default toString would do the same. src/main/java/org/apache/hadoop/hbase/util/ChecksumType.java:103-106 This is unnecessary because the default toString would do the same. src/main/java/org/apache/hadoop/hbase/util/ChecksumType.java:143-144 It looks like toString would to this. src/main/java/org/apache/hadoop/hbase/util/ChecksumType.java:179 Would not the built-in enum method valueOf do what this function is doing? src/test/java/org/apache/hadoop/hbase/io/hfile/TestHFileBlockCompatibility.java:1 This file still seems to contain a lot of copy-and-paste from TestHFileBlock. Are you planning to address that? REVISION DETAIL https://reviews.facebook.net/D1521
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12515829/D1521.9.patch
          against trunk revision .

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 55 new or modified tests.

          -1 javadoc. The javadoc tool appears to have generated -132 warning messages.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          -1 findbugs. The patch appears to introduce 157 new Findbugs (version 1.3.9) warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          -1 core tests. The patch failed these unit tests:
          org.apache.hadoop.hbase.regionserver.TestAtomicOperation
          org.apache.hadoop.hbase.mapreduce.TestImportTsv
          org.apache.hadoop.hbase.mapred.TestTableMapReduce
          org.apache.hadoop.hbase.mapreduce.TestHFileOutputFormat

          Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/1032//testReport/
          Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/1032//artifact/trunk/patchprocess/newPatchFindbugsWarnings.html
          Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/1032//console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12515829/D1521.9.patch against trunk revision . +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 55 new or modified tests. -1 javadoc. The javadoc tool appears to have generated -132 warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. -1 findbugs. The patch appears to introduce 157 new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed these unit tests: org.apache.hadoop.hbase.regionserver.TestAtomicOperation org.apache.hadoop.hbase.mapreduce.TestImportTsv org.apache.hadoop.hbase.mapred.TestTableMapReduce org.apache.hadoop.hbase.mapreduce.TestHFileOutputFormat Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/1032//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/1032//artifact/trunk/patchprocess/newPatchFindbugsWarnings.html Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/1032//console This message is automatically generated.
          Hide
          Phabricator added a comment -

          dhruba has commented on the revision "[jira] HBASE-5074 Support checksums in HBase block cache".

          INLINE COMMENTS
          src/main/java/org/apache/hadoop/hbase/io/hfile/HFile.java:409 as far as I know, it is not possible to obtain a FileSystem object from a FSDataInputStream
          src/main/java/org/apache/hadoop/hbase/io/hfile/HFileBlock.java:235 Yes, if we bump the major version to V3, then we can restart minorVersions from 0.

          REVISION DETAIL
          https://reviews.facebook.net/D1521

          Show
          Phabricator added a comment - dhruba has commented on the revision " [jira] HBASE-5074 Support checksums in HBase block cache". INLINE COMMENTS src/main/java/org/apache/hadoop/hbase/io/hfile/HFile.java:409 as far as I know, it is not possible to obtain a FileSystem object from a FSDataInputStream src/main/java/org/apache/hadoop/hbase/io/hfile/HFileBlock.java:235 Yes, if we bump the major version to V3, then we can restart minorVersions from 0. REVISION DETAIL https://reviews.facebook.net/D1521
          Hide
          Phabricator added a comment -

          dhruba has commented on the revision "[jira] HBASE-5074 Support checksums in HBase block cache".

          INLINE COMMENTS
          src/main/java/org/apache/hadoop/hbase/io/hfile/HFile.java:409 as far as I know, it is not possible to obtain a FileSystem object from a FSDataInputStream
          src/main/java/org/apache/hadoop/hbase/io/hfile/HFileBlock.java:235 Yes, if we bump the major version to V3, then we can restart minorVersions from 0.

          REVISION DETAIL
          https://reviews.facebook.net/D1521

          Show
          Phabricator added a comment - dhruba has commented on the revision " [jira] HBASE-5074 Support checksums in HBase block cache". INLINE COMMENTS src/main/java/org/apache/hadoop/hbase/io/hfile/HFile.java:409 as far as I know, it is not possible to obtain a FileSystem object from a FSDataInputStream src/main/java/org/apache/hadoop/hbase/io/hfile/HFileBlock.java:235 Yes, if we bump the major version to V3, then we can restart minorVersions from 0. REVISION DETAIL https://reviews.facebook.net/D1521
          Hide
          Phabricator added a comment -

          dhruba has commented on the revision "[jira] HBASE-5074 Support checksums in HBase block cache".

          INLINE COMMENTS
          src/main/java/org/apache/hadoop/hbase/io/hfile/HFileBlock.java:451-452 I think it is better to not add another 4 bytes to the HFileBlock (increases heapSize), instead just compute it when needed, especially since this method is used only for debugging.
          src/main/java/org/apache/hadoop/hbase/io/hfile/HFileBlock.java:529-530 shall we avoid increasing the HeapSize vs computing headerSize? It should be really cheap to compute headerSize(), especially since it is likely to be inlined.
          src/main/java/org/apache/hadoop/hbase/io/hfile/HFileBlock.java:1636 I think we should always print this. This follows the precedence in other parts of the HBase code. And this code path is the exception and not the norm
          src/main/java/org/apache/hadoop/hbase/io/hfile/HFileBlock.java:1642-1644 I am pretty sure that it is better to construct this message only if there is a checksum mismatch.
          src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java:3610-3612 The secret is to pass in a HFileSystem to HRegion.newHRegion(). This HFileSystem is extracted from the RegionServerServices, if it is not-null. Otherwise, a default file system object is created and passed into HRegion.newHRegion
          src/main/java/org/apache/hadoop/hbase/util/ChecksumType.java:57-60 getName() is better because it allows annotating the name differently from what Java does vi toString (especially if we add new crc algorithms in the future)
          src/main/java/org/apache/hadoop/hbase/util/ChecksumType.java:143-144 I would like to keep getName() because it allows us to not change the API if we decide to override java's toString convention, especially if we add new checksum algorithms in the future. (Similar to why there are two separate methods Enum.name and Enum.toString)
          src/main/java/org/apache/hadoop/hbase/util/ChecksumType.java:179 That's right. But the existence of this API allows us to do own own names in the future. (Also, when there are only two or three values, this might be better than looking into a map)
          src/test/java/org/apache/hadoop/hbase/io/hfile/TestHFileBlockCompatibility.java:1 I am not planning to change that, this code is what was there in HFileBlock, so it is good to carry it over in a unit test to be able to generate files in the older format. This is used by unit tests alone.

          JUst replacing it with a pre-created file(s) is not very cool, especially because the pre-created file(s) will test only that file whereas if we keep this code here, we can write more and more unit tests in the future that can generate different files in the older format and test backward compatibility.

          REVISION DETAIL
          https://reviews.facebook.net/D1521

          Show
          Phabricator added a comment - dhruba has commented on the revision " [jira] HBASE-5074 Support checksums in HBase block cache". INLINE COMMENTS src/main/java/org/apache/hadoop/hbase/io/hfile/HFileBlock.java:451-452 I think it is better to not add another 4 bytes to the HFileBlock (increases heapSize), instead just compute it when needed, especially since this method is used only for debugging. src/main/java/org/apache/hadoop/hbase/io/hfile/HFileBlock.java:529-530 shall we avoid increasing the HeapSize vs computing headerSize? It should be really cheap to compute headerSize(), especially since it is likely to be inlined. src/main/java/org/apache/hadoop/hbase/io/hfile/HFileBlock.java:1636 I think we should always print this. This follows the precedence in other parts of the HBase code. And this code path is the exception and not the norm src/main/java/org/apache/hadoop/hbase/io/hfile/HFileBlock.java:1642-1644 I am pretty sure that it is better to construct this message only if there is a checksum mismatch. src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java:3610-3612 The secret is to pass in a HFileSystem to HRegion.newHRegion(). This HFileSystem is extracted from the RegionServerServices, if it is not-null. Otherwise, a default file system object is created and passed into HRegion.newHRegion src/main/java/org/apache/hadoop/hbase/util/ChecksumType.java:57-60 getName() is better because it allows annotating the name differently from what Java does vi toString (especially if we add new crc algorithms in the future) src/main/java/org/apache/hadoop/hbase/util/ChecksumType.java:143-144 I would like to keep getName() because it allows us to not change the API if we decide to override java's toString convention, especially if we add new checksum algorithms in the future. (Similar to why there are two separate methods Enum.name and Enum.toString) src/main/java/org/apache/hadoop/hbase/util/ChecksumType.java:179 That's right. But the existence of this API allows us to do own own names in the future. (Also, when there are only two or three values, this might be better than looking into a map) src/test/java/org/apache/hadoop/hbase/io/hfile/TestHFileBlockCompatibility.java:1 I am not planning to change that, this code is what was there in HFileBlock, so it is good to carry it over in a unit test to be able to generate files in the older format. This is used by unit tests alone. JUst replacing it with a pre-created file(s) is not very cool, especially because the pre-created file(s) will test only that file whereas if we keep this code here, we can write more and more unit tests in the future that can generate different files in the older format and test backward compatibility. REVISION DETAIL https://reviews.facebook.net/D1521
          Hide
          Phabricator added a comment -

          dhruba updated the revision "[jira] HBASE-5074 Support checksums in HBase block cache".
          Reviewers: mbautin

          Addressed most of Stack/Ted/Mikails' comments.

          Mikhail: I did not change the interfaces of ChecksumType, just because I think
          what we got is more generic and flexible.

          Stack: I have been running it successfully with load on a 5 node test cluster for
          more than 72 hours. Will it be possible for you to take it for a basic sanity test?

          REVISION DETAIL
          https://reviews.facebook.net/D1521

          AFFECTED FILES
          src/test/java/org/apache/hadoop/hbase/HBaseTestingUtility.java
          src/test/java/org/apache/hadoop/hbase/regionserver/CreateRandomStoreFile.java
          src/test/java/org/apache/hadoop/hbase/regionserver/TestFSErrorsExposed.java
          src/test/java/org/apache/hadoop/hbase/regionserver/handler/TestCloseRegionHandler.java
          src/test/java/org/apache/hadoop/hbase/regionserver/HFileReadWriteTest.java
          src/test/java/org/apache/hadoop/hbase/regionserver/TestCompoundBloomFilter.java
          src/test/java/org/apache/hadoop/hbase/regionserver/TestHRegion.java
          src/test/java/org/apache/hadoop/hbase/regionserver/TestStoreFile.java
          src/test/java/org/apache/hadoop/hbase/io/hfile/TestHFileBlock.java
          src/test/java/org/apache/hadoop/hbase/io/hfile/TestChecksum.java
          src/test/java/org/apache/hadoop/hbase/io/hfile/TestCacheOnWrite.java
          src/test/java/org/apache/hadoop/hbase/io/hfile/TestHFileReaderV1.java
          src/test/java/org/apache/hadoop/hbase/io/hfile/TestFixedFileTrailer.java
          src/test/java/org/apache/hadoop/hbase/io/hfile/CacheTestUtils.java
          src/test/java/org/apache/hadoop/hbase/io/hfile/TestHFileBlockIndex.java
          src/test/java/org/apache/hadoop/hbase/io/hfile/TestHFileDataBlockEncoder.java
          src/test/java/org/apache/hadoop/hbase/io/hfile/TestHFileWriterV2.java
          src/test/java/org/apache/hadoop/hbase/io/hfile/TestHFileBlockCompatibility.java
          src/test/java/org/apache/hadoop/hbase/util/MockRegionServerServices.java
          src/main/java/org/apache/hadoop/hbase/HConstants.java
          src/main/java/org/apache/hadoop/hbase/fs
          src/main/java/org/apache/hadoop/hbase/fs/HFileSystem.java
          src/main/java/org/apache/hadoop/hbase/util/ChecksumType.java
          src/main/java/org/apache/hadoop/hbase/util/CompoundBloomFilter.java
          src/main/java/org/apache/hadoop/hbase/util/ChecksumFactory.java
          src/main/java/org/apache/hadoop/hbase/regionserver/RegionServerServices.java
          src/main/java/org/apache/hadoop/hbase/regionserver/metrics/RegionServerMetrics.java
          src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java
          src/main/java/org/apache/hadoop/hbase/regionserver/StoreFile.java
          src/main/java/org/apache/hadoop/hbase/regionserver/Store.java
          src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServer.java
          src/main/java/org/apache/hadoop/hbase/mapreduce/LoadIncrementalHFiles.java
          src/main/java/org/apache/hadoop/hbase/io/hfile/ChecksumUtil.java
          src/main/java/org/apache/hadoop/hbase/io/hfile/HFileBlock.java
          src/main/java/org/apache/hadoop/hbase/io/hfile/HFileDataBlockEncoderImpl.java
          src/main/java/org/apache/hadoop/hbase/io/hfile/HFileReaderV2.java
          src/main/java/org/apache/hadoop/hbase/io/hfile/HFile.java
          src/main/java/org/apache/hadoop/hbase/io/hfile/HFileWriterV2.java
          src/main/java/org/apache/hadoop/hbase/io/hfile/HFileDataBlockEncoder.java
          src/main/java/org/apache/hadoop/hbase/io/hfile/AbstractHFileReader.java
          src/main/java/org/apache/hadoop/hbase/io/hfile/NoOpDataBlockEncoder.java
          src/main/java/org/apache/hadoop/hbase/io/hfile/HFileReaderV1.java
          src/main/java/org/apache/hadoop/hbase/io/hfile/FixedFileTrailer.java
          src/main/java/org/apache/hadoop/hbase/io/hfile/HFileWriterV1.java

          Show
          Phabricator added a comment - dhruba updated the revision " [jira] HBASE-5074 Support checksums in HBase block cache". Reviewers: mbautin Addressed most of Stack/Ted/Mikails' comments. Mikhail: I did not change the interfaces of ChecksumType, just because I think what we got is more generic and flexible. Stack: I have been running it successfully with load on a 5 node test cluster for more than 72 hours. Will it be possible for you to take it for a basic sanity test? REVISION DETAIL https://reviews.facebook.net/D1521 AFFECTED FILES src/test/java/org/apache/hadoop/hbase/HBaseTestingUtility.java src/test/java/org/apache/hadoop/hbase/regionserver/CreateRandomStoreFile.java src/test/java/org/apache/hadoop/hbase/regionserver/TestFSErrorsExposed.java src/test/java/org/apache/hadoop/hbase/regionserver/handler/TestCloseRegionHandler.java src/test/java/org/apache/hadoop/hbase/regionserver/HFileReadWriteTest.java src/test/java/org/apache/hadoop/hbase/regionserver/TestCompoundBloomFilter.java src/test/java/org/apache/hadoop/hbase/regionserver/TestHRegion.java src/test/java/org/apache/hadoop/hbase/regionserver/TestStoreFile.java src/test/java/org/apache/hadoop/hbase/io/hfile/TestHFileBlock.java src/test/java/org/apache/hadoop/hbase/io/hfile/TestChecksum.java src/test/java/org/apache/hadoop/hbase/io/hfile/TestCacheOnWrite.java src/test/java/org/apache/hadoop/hbase/io/hfile/TestHFileReaderV1.java src/test/java/org/apache/hadoop/hbase/io/hfile/TestFixedFileTrailer.java src/test/java/org/apache/hadoop/hbase/io/hfile/CacheTestUtils.java src/test/java/org/apache/hadoop/hbase/io/hfile/TestHFileBlockIndex.java src/test/java/org/apache/hadoop/hbase/io/hfile/TestHFileDataBlockEncoder.java src/test/java/org/apache/hadoop/hbase/io/hfile/TestHFileWriterV2.java src/test/java/org/apache/hadoop/hbase/io/hfile/TestHFileBlockCompatibility.java src/test/java/org/apache/hadoop/hbase/util/MockRegionServerServices.java src/main/java/org/apache/hadoop/hbase/HConstants.java src/main/java/org/apache/hadoop/hbase/fs src/main/java/org/apache/hadoop/hbase/fs/HFileSystem.java src/main/java/org/apache/hadoop/hbase/util/ChecksumType.java src/main/java/org/apache/hadoop/hbase/util/CompoundBloomFilter.java src/main/java/org/apache/hadoop/hbase/util/ChecksumFactory.java src/main/java/org/apache/hadoop/hbase/regionserver/RegionServerServices.java src/main/java/org/apache/hadoop/hbase/regionserver/metrics/RegionServerMetrics.java src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java src/main/java/org/apache/hadoop/hbase/regionserver/StoreFile.java src/main/java/org/apache/hadoop/hbase/regionserver/Store.java src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServer.java src/main/java/org/apache/hadoop/hbase/mapreduce/LoadIncrementalHFiles.java src/main/java/org/apache/hadoop/hbase/io/hfile/ChecksumUtil.java src/main/java/org/apache/hadoop/hbase/io/hfile/HFileBlock.java src/main/java/org/apache/hadoop/hbase/io/hfile/HFileDataBlockEncoderImpl.java src/main/java/org/apache/hadoop/hbase/io/hfile/HFileReaderV2.java src/main/java/org/apache/hadoop/hbase/io/hfile/HFile.java src/main/java/org/apache/hadoop/hbase/io/hfile/HFileWriterV2.java src/main/java/org/apache/hadoop/hbase/io/hfile/HFileDataBlockEncoder.java src/main/java/org/apache/hadoop/hbase/io/hfile/AbstractHFileReader.java src/main/java/org/apache/hadoop/hbase/io/hfile/NoOpDataBlockEncoder.java src/main/java/org/apache/hadoop/hbase/io/hfile/HFileReaderV1.java src/main/java/org/apache/hadoop/hbase/io/hfile/FixedFileTrailer.java src/main/java/org/apache/hadoop/hbase/io/hfile/HFileWriterV1.java
          Hide
          Phabricator added a comment -

          dhruba has commented on the revision "[jira] HBASE-5074 Support checksums in HBase block cache".

          INLINE COMMENTS
          src/main/java/org/apache/hadoop/hbase/io/hfile/HFileBlock.java:451-452 I think it is better to not add another 4 bytes to the HFileBlock (increases heapSize), instead just compute it when needed, especially since this method is used only for debugging.
          src/main/java/org/apache/hadoop/hbase/io/hfile/HFileBlock.java:529-530 shall we avoid increasing the HeapSize vs computing headerSize? It should be really cheap to compute headerSize(), especially since it is likely to be inlined.
          src/main/java/org/apache/hadoop/hbase/io/hfile/HFileBlock.java:1636 I think we should always print this. This follows the precedence in other parts of the HBase code. And this code path is the exception and not the norm
          src/main/java/org/apache/hadoop/hbase/io/hfile/HFileBlock.java:1642-1644 I am pretty sure that it is better to construct this message only if there is a checksum mismatch.
          src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java:3610-3612 The secret is to pass in a HFileSystem to HRegion.newHRegion(). This HFileSystem is extracted from the RegionServerServices, if it is not-null. Otherwise, a default file system object is created and passed into HRegion.newHRegion
          src/main/java/org/apache/hadoop/hbase/util/ChecksumType.java:57-60 getName() is better because it allows annotating the name differently from what Java does vi toString (especially if we add new crc algorithms in the future)
          src/main/java/org/apache/hadoop/hbase/util/ChecksumType.java:143-144 I would like to keep getName() because it allows us to not change the API if we decide to override java's toString convention, especially if we add new checksum algorithms in the future. (Similar to why there are two separate methods Enum.name and Enum.toString)
          src/main/java/org/apache/hadoop/hbase/util/ChecksumType.java:179 That's right. But the existence of this API allows us to do own own names in the future. (Also, when there are only two or three values, this might be better than looking into a map)
          src/test/java/org/apache/hadoop/hbase/io/hfile/TestHFileBlockCompatibility.java:1 I am not planning to change that, this code is what was there in HFileBlock, so it is good to carry it over in a unit test to be able to generate files in the older format. This is used by unit tests alone.

          JUst replacing it with a pre-created file(s) is not very cool, especially because the pre-created file(s) will test only that file whereas if we keep this code here, we can write more and more unit tests in the future that can generate different files in the older format and test backward compatibility.

          REVISION DETAIL
          https://reviews.facebook.net/D1521

          Show
          Phabricator added a comment - dhruba has commented on the revision " [jira] HBASE-5074 Support checksums in HBase block cache". INLINE COMMENTS src/main/java/org/apache/hadoop/hbase/io/hfile/HFileBlock.java:451-452 I think it is better to not add another 4 bytes to the HFileBlock (increases heapSize), instead just compute it when needed, especially since this method is used only for debugging. src/main/java/org/apache/hadoop/hbase/io/hfile/HFileBlock.java:529-530 shall we avoid increasing the HeapSize vs computing headerSize? It should be really cheap to compute headerSize(), especially since it is likely to be inlined. src/main/java/org/apache/hadoop/hbase/io/hfile/HFileBlock.java:1636 I think we should always print this. This follows the precedence in other parts of the HBase code. And this code path is the exception and not the norm src/main/java/org/apache/hadoop/hbase/io/hfile/HFileBlock.java:1642-1644 I am pretty sure that it is better to construct this message only if there is a checksum mismatch. src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java:3610-3612 The secret is to pass in a HFileSystem to HRegion.newHRegion(). This HFileSystem is extracted from the RegionServerServices, if it is not-null. Otherwise, a default file system object is created and passed into HRegion.newHRegion src/main/java/org/apache/hadoop/hbase/util/ChecksumType.java:57-60 getName() is better because it allows annotating the name differently from what Java does vi toString (especially if we add new crc algorithms in the future) src/main/java/org/apache/hadoop/hbase/util/ChecksumType.java:143-144 I would like to keep getName() because it allows us to not change the API if we decide to override java's toString convention, especially if we add new checksum algorithms in the future. (Similar to why there are two separate methods Enum.name and Enum.toString) src/main/java/org/apache/hadoop/hbase/util/ChecksumType.java:179 That's right. But the existence of this API allows us to do own own names in the future. (Also, when there are only two or three values, this might be better than looking into a map) src/test/java/org/apache/hadoop/hbase/io/hfile/TestHFileBlockCompatibility.java:1 I am not planning to change that, this code is what was there in HFileBlock, so it is good to carry it over in a unit test to be able to generate files in the older format. This is used by unit tests alone. JUst replacing it with a pre-created file(s) is not very cool, especially because the pre-created file(s) will test only that file whereas if we keep this code here, we can write more and more unit tests in the future that can generate different files in the older format and test backward compatibility. REVISION DETAIL https://reviews.facebook.net/D1521
          Hide
          Phabricator added a comment -

          dhruba updated the revision "[jira] HBASE-5074 Support checksums in HBase block cache".
          Reviewers: mbautin

          Addressed most of Stack/Ted/Mikails' comments.

          Mikhail: I did not change the interfaces of ChecksumType, just because I think
          what we got is more generic and flexible.

          Stack: I have been running it successfully with load on a 5 node test cluster for
          more than 72 hours. Will it be possible for you to take it for a basic sanity test?

          REVISION DETAIL
          https://reviews.facebook.net/D1521

          AFFECTED FILES
          src/test/java/org/apache/hadoop/hbase/HBaseTestingUtility.java
          src/test/java/org/apache/hadoop/hbase/regionserver/CreateRandomStoreFile.java
          src/test/java/org/apache/hadoop/hbase/regionserver/TestFSErrorsExposed.java
          src/test/java/org/apache/hadoop/hbase/regionserver/handler/TestCloseRegionHandler.java
          src/test/java/org/apache/hadoop/hbase/regionserver/HFileReadWriteTest.java
          src/test/java/org/apache/hadoop/hbase/regionserver/TestCompoundBloomFilter.java
          src/test/java/org/apache/hadoop/hbase/regionserver/TestHRegion.java
          src/test/java/org/apache/hadoop/hbase/regionserver/TestStoreFile.java
          src/test/java/org/apache/hadoop/hbase/io/hfile/TestHFileBlock.java
          src/test/java/org/apache/hadoop/hbase/io/hfile/TestChecksum.java
          src/test/java/org/apache/hadoop/hbase/io/hfile/TestCacheOnWrite.java
          src/test/java/org/apache/hadoop/hbase/io/hfile/TestHFileReaderV1.java
          src/test/java/org/apache/hadoop/hbase/io/hfile/TestFixedFileTrailer.java
          src/test/java/org/apache/hadoop/hbase/io/hfile/CacheTestUtils.java
          src/test/java/org/apache/hadoop/hbase/io/hfile/TestHFileBlockIndex.java
          src/test/java/org/apache/hadoop/hbase/io/hfile/TestHFileDataBlockEncoder.java
          src/test/java/org/apache/hadoop/hbase/io/hfile/TestHFileWriterV2.java
          src/test/java/org/apache/hadoop/hbase/io/hfile/TestHFileBlockCompatibility.java
          src/test/java/org/apache/hadoop/hbase/util/MockRegionServerServices.java
          src/main/java/org/apache/hadoop/hbase/HConstants.java
          src/main/java/org/apache/hadoop/hbase/fs
          src/main/java/org/apache/hadoop/hbase/fs/HFileSystem.java
          src/main/java/org/apache/hadoop/hbase/util/ChecksumType.java
          src/main/java/org/apache/hadoop/hbase/util/CompoundBloomFilter.java
          src/main/java/org/apache/hadoop/hbase/util/ChecksumFactory.java
          src/main/java/org/apache/hadoop/hbase/regionserver/RegionServerServices.java
          src/main/java/org/apache/hadoop/hbase/regionserver/metrics/RegionServerMetrics.java
          src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java
          src/main/java/org/apache/hadoop/hbase/regionserver/StoreFile.java
          src/main/java/org/apache/hadoop/hbase/regionserver/Store.java
          src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServer.java
          src/main/java/org/apache/hadoop/hbase/mapreduce/LoadIncrementalHFiles.java
          src/main/java/org/apache/hadoop/hbase/io/hfile/ChecksumUtil.java
          src/main/java/org/apache/hadoop/hbase/io/hfile/HFileBlock.java
          src/main/java/org/apache/hadoop/hbase/io/hfile/HFileDataBlockEncoderImpl.java
          src/main/java/org/apache/hadoop/hbase/io/hfile/HFileReaderV2.java
          src/main/java/org/apache/hadoop/hbase/io/hfile/HFile.java
          src/main/java/org/apache/hadoop/hbase/io/hfile/HFileWriterV2.java
          src/main/java/org/apache/hadoop/hbase/io/hfile/HFileDataBlockEncoder.java
          src/main/java/org/apache/hadoop/hbase/io/hfile/AbstractHFileReader.java
          src/main/java/org/apache/hadoop/hbase/io/hfile/NoOpDataBlockEncoder.java
          src/main/java/org/apache/hadoop/hbase/io/hfile/HFileReaderV1.java
          src/main/java/org/apache/hadoop/hbase/io/hfile/FixedFileTrailer.java
          src/main/java/org/apache/hadoop/hbase/io/hfile/HFileWriterV1.java

          Show
          Phabricator added a comment - dhruba updated the revision " [jira] HBASE-5074 Support checksums in HBase block cache". Reviewers: mbautin Addressed most of Stack/Ted/Mikails' comments. Mikhail: I did not change the interfaces of ChecksumType, just because I think what we got is more generic and flexible. Stack: I have been running it successfully with load on a 5 node test cluster for more than 72 hours. Will it be possible for you to take it for a basic sanity test? REVISION DETAIL https://reviews.facebook.net/D1521 AFFECTED FILES src/test/java/org/apache/hadoop/hbase/HBaseTestingUtility.java src/test/java/org/apache/hadoop/hbase/regionserver/CreateRandomStoreFile.java src/test/java/org/apache/hadoop/hbase/regionserver/TestFSErrorsExposed.java src/test/java/org/apache/hadoop/hbase/regionserver/handler/TestCloseRegionHandler.java src/test/java/org/apache/hadoop/hbase/regionserver/HFileReadWriteTest.java src/test/java/org/apache/hadoop/hbase/regionserver/TestCompoundBloomFilter.java src/test/java/org/apache/hadoop/hbase/regionserver/TestHRegion.java src/test/java/org/apache/hadoop/hbase/regionserver/TestStoreFile.java src/test/java/org/apache/hadoop/hbase/io/hfile/TestHFileBlock.java src/test/java/org/apache/hadoop/hbase/io/hfile/TestChecksum.java src/test/java/org/apache/hadoop/hbase/io/hfile/TestCacheOnWrite.java src/test/java/org/apache/hadoop/hbase/io/hfile/TestHFileReaderV1.java src/test/java/org/apache/hadoop/hbase/io/hfile/TestFixedFileTrailer.java src/test/java/org/apache/hadoop/hbase/io/hfile/CacheTestUtils.java src/test/java/org/apache/hadoop/hbase/io/hfile/TestHFileBlockIndex.java src/test/java/org/apache/hadoop/hbase/io/hfile/TestHFileDataBlockEncoder.java src/test/java/org/apache/hadoop/hbase/io/hfile/TestHFileWriterV2.java src/test/java/org/apache/hadoop/hbase/io/hfile/TestHFileBlockCompatibility.java src/test/java/org/apache/hadoop/hbase/util/MockRegionServerServices.java src/main/java/org/apache/hadoop/hbase/HConstants.java src/main/java/org/apache/hadoop/hbase/fs src/main/java/org/apache/hadoop/hbase/fs/HFileSystem.java src/main/java/org/apache/hadoop/hbase/util/ChecksumType.java src/main/java/org/apache/hadoop/hbase/util/CompoundBloomFilter.java src/main/java/org/apache/hadoop/hbase/util/ChecksumFactory.java src/main/java/org/apache/hadoop/hbase/regionserver/RegionServerServices.java src/main/java/org/apache/hadoop/hbase/regionserver/metrics/RegionServerMetrics.java src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java src/main/java/org/apache/hadoop/hbase/regionserver/StoreFile.java src/main/java/org/apache/hadoop/hbase/regionserver/Store.java src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServer.java src/main/java/org/apache/hadoop/hbase/mapreduce/LoadIncrementalHFiles.java src/main/java/org/apache/hadoop/hbase/io/hfile/ChecksumUtil.java src/main/java/org/apache/hadoop/hbase/io/hfile/HFileBlock.java src/main/java/org/apache/hadoop/hbase/io/hfile/HFileDataBlockEncoderImpl.java src/main/java/org/apache/hadoop/hbase/io/hfile/HFileReaderV2.java src/main/java/org/apache/hadoop/hbase/io/hfile/HFile.java src/main/java/org/apache/hadoop/hbase/io/hfile/HFileWriterV2.java src/main/java/org/apache/hadoop/hbase/io/hfile/HFileDataBlockEncoder.java src/main/java/org/apache/hadoop/hbase/io/hfile/AbstractHFileReader.java src/main/java/org/apache/hadoop/hbase/io/hfile/NoOpDataBlockEncoder.java src/main/java/org/apache/hadoop/hbase/io/hfile/HFileReaderV1.java src/main/java/org/apache/hadoop/hbase/io/hfile/FixedFileTrailer.java src/main/java/org/apache/hadoop/hbase/io/hfile/HFileWriterV1.java
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12516146/D1521.10.patch
          against trunk revision .

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 55 new or modified tests.

          -1 javadoc. The javadoc tool appears to have generated -127 warning messages.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          -1 findbugs. The patch appears to introduce 159 new Findbugs (version 1.3.9) warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          -1 core tests. The patch failed these unit tests:
          org.apache.hadoop.hbase.regionserver.TestAtomicOperation
          org.apache.hadoop.hbase.mapreduce.TestImportTsv
          org.apache.hadoop.hbase.mapred.TestTableMapReduce
          org.apache.hadoop.hbase.mapreduce.TestHFileOutputFormat

          Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/1052//testReport/
          Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/1052//artifact/trunk/patchprocess/newPatchFindbugsWarnings.html
          Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/1052//console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12516146/D1521.10.patch against trunk revision . +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 55 new or modified tests. -1 javadoc. The javadoc tool appears to have generated -127 warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. -1 findbugs. The patch appears to introduce 159 new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed these unit tests: org.apache.hadoop.hbase.regionserver.TestAtomicOperation org.apache.hadoop.hbase.mapreduce.TestImportTsv org.apache.hadoop.hbase.mapred.TestTableMapReduce org.apache.hadoop.hbase.mapreduce.TestHFileOutputFormat Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/1052//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/1052//artifact/trunk/patchprocess/newPatchFindbugsWarnings.html Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/1052//console This message is automatically generated.
          Hide
          stack added a comment -

          Reattach to rerun via hadoopqa

          Show
          stack added a comment - Reattach to rerun via hadoopqa
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12516181/D1521.10.patch
          against trunk revision .

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 55 new or modified tests.

          -1 javadoc. The javadoc tool appears to have generated -127 warning messages.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          -1 findbugs. The patch appears to introduce 159 new Findbugs (version 1.3.9) warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          -1 core tests. The patch failed these unit tests:
          org.apache.hadoop.hbase.replication.TestReplication
          org.apache.hadoop.hbase.mapreduce.TestImportTsv
          org.apache.hadoop.hbase.mapred.TestTableMapReduce
          org.apache.hadoop.hbase.mapreduce.TestHFileOutputFormat

          Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/1054//testReport/
          Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/1054//artifact/trunk/patchprocess/newPatchFindbugsWarnings.html
          Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/1054//console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12516181/D1521.10.patch against trunk revision . +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 55 new or modified tests. -1 javadoc. The javadoc tool appears to have generated -127 warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. -1 findbugs. The patch appears to introduce 159 new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed these unit tests: org.apache.hadoop.hbase.replication.TestReplication org.apache.hadoop.hbase.mapreduce.TestImportTsv org.apache.hadoop.hbase.mapred.TestTableMapReduce org.apache.hadoop.hbase.mapreduce.TestHFileOutputFormat Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/1054//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/1054//artifact/trunk/patchprocess/newPatchFindbugsWarnings.html Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/1054//console This message is automatically generated.
          Hide
          stack added a comment -

          try again though different test apart from the usual three failed this time

          Show
          stack added a comment - try again though different test apart from the usual three failed this time
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12516189/D1521.10.patch
          against trunk revision .

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 55 new or modified tests.

          -1 javadoc. The javadoc tool appears to have generated -127 warning messages.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          -1 findbugs. The patch appears to introduce 159 new Findbugs (version 1.3.9) warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          -1 core tests. The patch failed these unit tests:
          org.apache.hadoop.hbase.regionserver.TestAtomicOperation
          org.apache.hadoop.hbase.mapreduce.TestImportTsv
          org.apache.hadoop.hbase.mapred.TestTableMapReduce
          org.apache.hadoop.hbase.mapreduce.TestHFileOutputFormat

          Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/1055//testReport/
          Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/1055//artifact/trunk/patchprocess/newPatchFindbugsWarnings.html
          Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/1055//console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12516189/D1521.10.patch against trunk revision . +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 55 new or modified tests. -1 javadoc. The javadoc tool appears to have generated -127 warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. -1 findbugs. The patch appears to introduce 159 new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed these unit tests: org.apache.hadoop.hbase.regionserver.TestAtomicOperation org.apache.hadoop.hbase.mapreduce.TestImportTsv org.apache.hadoop.hbase.mapred.TestTableMapReduce org.apache.hadoop.hbase.mapreduce.TestHFileOutputFormat Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/1055//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/1055//artifact/trunk/patchprocess/newPatchFindbugsWarnings.html Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/1055//console This message is automatically generated.
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12516208/D1521.10.patch
          against trunk revision .

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 55 new or modified tests.

          -1 javadoc. The javadoc tool appears to have generated -127 warning messages.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          -1 findbugs. The patch appears to introduce 159 new Findbugs (version 1.3.9) warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          -1 core tests. The patch failed these unit tests:
          org.apache.hadoop.hbase.mapreduce.TestImportTsv
          org.apache.hadoop.hbase.mapred.TestTableMapReduce
          org.apache.hadoop.hbase.mapreduce.TestHFileOutputFormat

          Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/1056//testReport/
          Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/1056//artifact/trunk/patchprocess/newPatchFindbugsWarnings.html
          Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/1056//console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12516208/D1521.10.patch against trunk revision . +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 55 new or modified tests. -1 javadoc. The javadoc tool appears to have generated -127 warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. -1 findbugs. The patch appears to introduce 159 new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed these unit tests: org.apache.hadoop.hbase.mapreduce.TestImportTsv org.apache.hadoop.hbase.mapred.TestTableMapReduce org.apache.hadoop.hbase.mapreduce.TestHFileOutputFormat Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/1056//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/1056//artifact/trunk/patchprocess/newPatchFindbugsWarnings.html Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/1056//console This message is automatically generated.
          Hide
          Phabricator added a comment -

          tedyu has commented on the revision "[jira] HBASE-5074 Support checksums in HBase block cache".

          INLINE COMMENTS
          src/main/java/org/apache/hadoop/hbase/io/hfile/HFileBlock.java:257 Should we consider majorVersion ?

          REVISION DETAIL
          https://reviews.facebook.net/D1521

          Show
          Phabricator added a comment - tedyu has commented on the revision " [jira] HBASE-5074 Support checksums in HBase block cache". INLINE COMMENTS src/main/java/org/apache/hadoop/hbase/io/hfile/HFileBlock.java:257 Should we consider majorVersion ? REVISION DETAIL https://reviews.facebook.net/D1521
          Hide
          Phabricator added a comment -

          tedyu has commented on the revision "[jira] HBASE-5074 Support checksums in HBase block cache".

          INLINE COMMENTS
          src/main/java/org/apache/hadoop/hbase/io/hfile/HFileBlock.java:257 Should we consider majorVersion ?

          REVISION DETAIL
          https://reviews.facebook.net/D1521

          Show
          Phabricator added a comment - tedyu has commented on the revision " [jira] HBASE-5074 Support checksums in HBase block cache". INLINE COMMENTS src/main/java/org/apache/hadoop/hbase/io/hfile/HFileBlock.java:257 Should we consider majorVersion ? REVISION DETAIL https://reviews.facebook.net/D1521
          Hide
          Phabricator added a comment -

          dhruba has commented on the revision "[jira] HBASE-5074 Support checksums in HBase block cache".

          INLINE COMMENTS
          src/main/java/org/apache/hadoop/hbase/io/hfile/HFileBlock.java:257 In my opinion, we do not need a majorVersion in the in-memory HFileBlock object. Adding it will add to heap-space (albeit not much), but we can always add it later when needed... especially because it is only in-memory and not a disk-format change. Ted: do you agree?

          REVISION DETAIL
          https://reviews.facebook.net/D1521

          Show
          Phabricator added a comment - dhruba has commented on the revision " [jira] HBASE-5074 Support checksums in HBase block cache". INLINE COMMENTS src/main/java/org/apache/hadoop/hbase/io/hfile/HFileBlock.java:257 In my opinion, we do not need a majorVersion in the in-memory HFileBlock object. Adding it will add to heap-space (albeit not much), but we can always add it later when needed... especially because it is only in-memory and not a disk-format change. Ted: do you agree? REVISION DETAIL https://reviews.facebook.net/D1521
          Hide
          Phabricator added a comment -

          dhruba has commented on the revision "[jira] HBASE-5074 Support checksums in HBase block cache".

          INLINE COMMENTS
          src/main/java/org/apache/hadoop/hbase/io/hfile/HFileBlock.java:257 In my opinion, we do not need a majorVersion in the in-memory HFileBlock object. Adding it will add to heap-space (albeit not much), but we can always add it later when needed... especially because it is only in-memory and not a disk-format change. Ted: do you agree?

          REVISION DETAIL
          https://reviews.facebook.net/D1521

          Show
          Phabricator added a comment - dhruba has commented on the revision " [jira] HBASE-5074 Support checksums in HBase block cache". INLINE COMMENTS src/main/java/org/apache/hadoop/hbase/io/hfile/HFileBlock.java:257 In my opinion, we do not need a majorVersion in the in-memory HFileBlock object. Adding it will add to heap-space (albeit not much), but we can always add it later when needed... especially because it is only in-memory and not a disk-format change. Ted: do you agree? REVISION DETAIL https://reviews.facebook.net/D1521
          Hide
          Phabricator added a comment -

          tedyu has commented on the revision "[jira] HBASE-5074 Support checksums in HBase block cache".

          INLINE COMMENTS
          src/main/java/org/apache/hadoop/hbase/io/hfile/HFileBlock.java:257 Quoting Dhruba's reply:
          Yes, if we bump the major version to V3, then we can restart minorVersions from 0.

          So how do we support major version 3, minor version 0 with checksum feature ?

          REVISION DETAIL
          https://reviews.facebook.net/D1521

          Show
          Phabricator added a comment - tedyu has commented on the revision " [jira] HBASE-5074 Support checksums in HBase block cache". INLINE COMMENTS src/main/java/org/apache/hadoop/hbase/io/hfile/HFileBlock.java:257 Quoting Dhruba's reply: Yes, if we bump the major version to V3, then we can restart minorVersions from 0. So how do we support major version 3, minor version 0 with checksum feature ? REVISION DETAIL https://reviews.facebook.net/D1521
          Hide
          Phabricator added a comment -

          tedyu has commented on the revision "[jira] HBASE-5074 Support checksums in HBase block cache".

          INLINE COMMENTS
          src/main/java/org/apache/hadoop/hbase/io/hfile/HFileBlock.java:257 Quoting Dhruba's reply:
          Yes, if we bump the major version to V3, then we can restart minorVersions from 0.

          So how do we support major version 3, minor version 0 with checksum feature ?

          REVISION DETAIL
          https://reviews.facebook.net/D1521

          Show
          Phabricator added a comment - tedyu has commented on the revision " [jira] HBASE-5074 Support checksums in HBase block cache". INLINE COMMENTS src/main/java/org/apache/hadoop/hbase/io/hfile/HFileBlock.java:257 Quoting Dhruba's reply: Yes, if we bump the major version to V3, then we can restart minorVersions from 0. So how do we support major version 3, minor version 0 with checksum feature ? REVISION DETAIL https://reviews.facebook.net/D1521
          Hide
          Phabricator added a comment -

          todd has commented on the revision "[jira] HBASE-5074 Support checksums in HBase block cache".

          INLINE COMMENTS
          src/main/java/org/apache/hadoop/hbase/io/hfile/HFileBlock.java:257 maybe we can add a static final int majorVersion = 2; in this class, so the version checks are there, but it doesn't take up heap space? Then when/if we add a v3, we can make it non-final non-static without having to hunt down all the places where we might have major-version assumptions? The JIT will happily optimize out any if-statements against the constant.

          REVISION DETAIL
          https://reviews.facebook.net/D1521

          Show
          Phabricator added a comment - todd has commented on the revision " [jira] HBASE-5074 Support checksums in HBase block cache". INLINE COMMENTS src/main/java/org/apache/hadoop/hbase/io/hfile/HFileBlock.java:257 maybe we can add a static final int majorVersion = 2; in this class, so the version checks are there, but it doesn't take up heap space? Then when/if we add a v3, we can make it non-final non-static without having to hunt down all the places where we might have major-version assumptions? The JIT will happily optimize out any if-statements against the constant. REVISION DETAIL https://reviews.facebook.net/D1521
          Hide
          Phabricator added a comment -

          todd has commented on the revision "[jira] HBASE-5074 Support checksums in HBase block cache".

          INLINE COMMENTS
          src/main/java/org/apache/hadoop/hbase/io/hfile/HFileBlock.java:257 maybe we can add a static final int majorVersion = 2; in this class, so the version checks are there, but it doesn't take up heap space? Then when/if we add a v3, we can make it non-final non-static without having to hunt down all the places where we might have major-version assumptions? The JIT will happily optimize out any if-statements against the constant.

          REVISION DETAIL
          https://reviews.facebook.net/D1521

          Show
          Phabricator added a comment - todd has commented on the revision " [jira] HBASE-5074 Support checksums in HBase block cache". INLINE COMMENTS src/main/java/org/apache/hadoop/hbase/io/hfile/HFileBlock.java:257 maybe we can add a static final int majorVersion = 2; in this class, so the version checks are there, but it doesn't take up heap space? Then when/if we add a v3, we can make it non-final non-static without having to hunt down all the places where we might have major-version assumptions? The JIT will happily optimize out any if-statements against the constant. REVISION DETAIL https://reviews.facebook.net/D1521
          Hide
          stack added a comment -

          I see these in the logs when I run the patch; its a little odd because it says not using PureJavaCrc32 but will use CRC32 but then prints out stacktrace anyways:

          2012-02-27 23:34:20,911 INFO org.apache.hadoop.hbase.regionserver.HRegionServer: Received request to open region: TestTable,0000150828,1330380684339.ebb37d5d0e2c1f4a8b111830a46e7cbc.
          2012-02-27 23:34:20,914 INFO org.apache.hadoop.hbase.regionserver.Store: time to purge deletes set to 0ms in store null
          2012-02-27 23:34:20,930 INFO org.apache.hadoop.hbase.util.ChecksumType: org.apache.hadoop.util.PureJavaCrc32 not available.
          2012-02-27 23:34:20,930 INFO org.apache.hadoop.hbase.util.ChecksumType: Checksum using java.util.zip.CRC32
          2012-02-27 23:34:20,931 WARN org.apache.hadoop.hbase.util.ChecksumType: org.apache.hadoop.util.PureJavaCrc32C not available.
          java.io.IOException: java.lang.ClassNotFoundException: org.apache.hadoop.util.PureJavaCrc32C
                  at org.apache.hadoop.hbase.util.ChecksumFactory.newConstructor(ChecksumFactory.java:65)
                  at org.apache.hadoop.hbase.util.ChecksumType$3.initialize(ChecksumType.java:113)
                  at org.apache.hadoop.hbase.util.ChecksumType.<init>(ChecksumType.java:148)
                  at org.apache.hadoop.hbase.util.ChecksumType.<init>(ChecksumType.java:37)
                  at org.apache.hadoop.hbase.util.ChecksumType$3.<init>(ChecksumType.java:100)
                  at org.apache.hadoop.hbase.util.ChecksumType.<clinit>(ChecksumType.java:100)
                  at org.apache.hadoop.hbase.io.hfile.HFile.<clinit>(HFile.java:163)
                  at org.apache.hadoop.hbase.regionserver.StoreFile$Reader.<init>(StoreFile.java:1252)
                  at org.apache.hadoop.hbase.regionserver.StoreFile.open(StoreFile.java:516)
                  at org.apache.hadoop.hbase.regionserver.StoreFile.createReader(StoreFile.java:606)
                  at org.apache.hadoop.hbase.regionserver.Store$1.call(Store.java:375)
                  at org.apache.hadoop.hbase.regionserver.Store$1.call(Store.java:370)
                  at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
                  at java.util.concurrent.FutureTask.run(FutureTask.java:138)
                  at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441)
                  at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
                  at java.util.concurrent.FutureTask.run(FutureTask.java:138)
                  at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
                  at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
                  at java.lang.Thread.run(Thread.java:662)
          Caused by: java.lang.ClassNotFoundException: org.apache.hadoop.util.PureJavaCrc32C
                  at java.net.URLClassLoader$1.run(URLClassLoader.java:202)
                  at java.security.AccessController.doPrivileged(Native Method)
                  at java.net.URLClassLoader.findClass(URLClassLoader.java:190)
                  at java.lang.ClassLoader.loadClass(ClassLoader.java:306)
                  at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:301)
                  at java.lang.ClassLoader.loadClass(ClassLoader.java:247)
                  at java.lang.Class.forName0(Native Method)
                  at java.lang.Class.forName(Class.java:247)
                  at org.apache.hadoop.hbase.util.ChecksumFactory.getClassByName(ChecksumFactory.java:97)
                  at org.apache.hadoop.hbase.util.ChecksumFactory.newConstructor(ChecksumFactory.java:60)
                  ... 19 more
          

          I'm not sure on whats happening. It would seem we're using default CRC32 but then I'm not sure how I get the above exception reading code.

          Also, not sure if I have this facility turned on. Its on by default but I don't see anything in logs saying its on (and I don't have metrics on this cluster, nor do I have a good handle on before and after regards whether this feature makes a difference).

          I caught this in a heap dump:

          "IPC Server handler 0 on 7003" daemon prio=10 tid=0x00007f4a1410c800 nid=0x24b2 runnable [0x00007f4a20487000]
             java.lang.Thread.State: RUNNABLE
                  at java.util.zip.CRC32.updateBytes(Native Method)
                  at java.util.zip.CRC32.update(CRC32.java:45)
                  at org.apache.hadoop.util.DataChecksum.update(DataChecksum.java:223)
                  at org.apache.hadoop.fs.FSInputChecker.readChecksumChunk(FSInputChecker.java:240)
                  at org.apache.hadoop.fs.FSInputChecker.read1(FSInputChecker.java:189)
                  at org.apache.hadoop.fs.FSInputChecker.read(FSInputChecker.java:158)
                  - locked <0x00000006fc68e9d8> (a org.apache.hadoop.hdfs.BlockReaderLocal)
                  at org.apache.hadoop.hdfs.DFSClient$BlockReader.read(DFSClient.java:1457)
                  - locked <0x00000006fc68e9d8> (a org.apache.hadoop.hdfs.BlockReaderLocal)
                  at org.apache.hadoop.hdfs.BlockReaderLocal.read(BlockReaderLocal.java:326)
                  - locked <0x00000006fc68e9d8> (a org.apache.hadoop.hdfs.BlockReaderLocal)
                  at org.apache.hadoop.fs.FSInputChecker.readFully(FSInputChecker.java:384)
                  at org.apache.hadoop.hdfs.DFSClient$BlockReader.readAll(DFSClient.java:1760)
                  at org.apache.hadoop.hdfs.DFSClient$DFSInputStream.fetchBlockByteRange(DFSClient.java:2330)
                  at org.apache.hadoop.hdfs.DFSClient$DFSInputStream.read(DFSClient.java:2397)
                  at org.apache.hadoop.fs.FSDataInputStream.read(FSDataInputStream.java:46)
                  at org.apache.hadoop.hbase.io.hfile.HFileBlock$AbstractFSReader.readAtOffset(HFileBlock.java:1333)
                  at org.apache.hadoop.hbase.io.hfile.HFileBlock$FSReaderV2.readBlockDataInternal(HFileBlock.java:1769)
                  at org.apache.hadoop.hbase.io.hfile.HFileBlock$FSReaderV2.readBlockData(HFileBlock.java:1633)
                  at org.apache.hadoop.hbase.io.hfile.HFileReaderV2.readBlock(HFileReaderV2.java:328)
                  at org.apache.hadoop.hbase.io.hfile.HFileBlockIndex$BlockIndexReader.seekToDataBlock(HFileBlockIndex.java:213)
                  at org.apache.hadoop.hbase.io.hfile.HFileReaderV2$AbstractScannerV2.seekTo(HFileReaderV2.java:462)
                  at org.apache.hadoop.hbase.io.hfile.HFileReaderV2$AbstractScannerV2.seekTo(HFileReaderV2.java:482)
                  at org.apache.hadoop.hbase.regionserver.StoreFileScanner.seekAtOrAfter(StoreFileScanner.java:226)
                  at org.apache.hadoop.hbase.regionserver.StoreFileScanner.seek(StoreFileScanner.java:145)
                  at org.apache.hadoop.hbase.regionserver.StoreFileScanner.enforceSeek(StoreFileScanner.java:351)
                  at org.apache.hadoop.hbase.regionserver.KeyValueHeap.pollRealKV(KeyValueHeap.java:333)
                  at org.apache.hadoop.hbase.regionserver.KeyValueHeap.generalizedSeek(KeyValueHeap.java:291)
                  at org.apache.hadoop.hbase.regionserver.KeyValueHeap.requestSeek(KeyValueHeap.java:256)
                  at org.apache.hadoop.hbase.regionserver.StoreScanner.reseek(StoreScanner.java:518)
                  - locked <0x00000006fc67cd70> (a org.apache.hadoop.hbase.regionserver.StoreScanner)
                  at org.apache.hadoop.hbase.regionserver.StoreScanner.next(StoreScanner.java:401)
                  - locked <0x00000006fc67cd70> (a org.apache.hadoop.hbase.regionserver.StoreScanner)
                  at org.apache.hadoop.hbase.regionserver.KeyValueHeap.next(KeyValueHeap.java:127)
                  at org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.nextInternal(HRegion.java:3388)
                  at org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.next(HRegion.java:3344)
                  - locked <0x00000006fc67cc50> (a org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl)
                  at org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.next(HRegion.java:3361)
                  - locked <0x00000006fc67cc50> (a org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl)
                  at org.apache.hadoop.hbase.regionserver.HRegion.get(HRegion.java:4145)
                  at org.apache.hadoop.hbase.regionserver.HRegion.get(HRegion.java:4035)
                  at org.apache.hadoop.hbase.regionserver.HRegionServer.get(HRegionServer.java:1957)
                  at sun.reflect.GeneratedMethodAccessor23.invoke(Unknown Source)
                  at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
                  at java.lang.reflect.Method.invoke(Method.java:597)
                  at org.apache.hadoop.hbase.ipc.WritableRpcEngine$Server.call(WritableRpcEngine.java:364)
                  at org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:1344)
          

          Maybe its not on? Thanks Dhruba.

          Show
          stack added a comment - I see these in the logs when I run the patch; its a little odd because it says not using PureJavaCrc32 but will use CRC32 but then prints out stacktrace anyways: 2012-02-27 23:34:20,911 INFO org.apache.hadoop.hbase.regionserver.HRegionServer: Received request to open region: TestTable,0000150828,1330380684339.ebb37d5d0e2c1f4a8b111830a46e7cbc. 2012-02-27 23:34:20,914 INFO org.apache.hadoop.hbase.regionserver.Store: time to purge deletes set to 0ms in store null 2012-02-27 23:34:20,930 INFO org.apache.hadoop.hbase.util.ChecksumType: org.apache.hadoop.util.PureJavaCrc32 not available. 2012-02-27 23:34:20,930 INFO org.apache.hadoop.hbase.util.ChecksumType: Checksum using java.util.zip.CRC32 2012-02-27 23:34:20,931 WARN org.apache.hadoop.hbase.util.ChecksumType: org.apache.hadoop.util.PureJavaCrc32C not available. java.io.IOException: java.lang.ClassNotFoundException: org.apache.hadoop.util.PureJavaCrc32C at org.apache.hadoop.hbase.util.ChecksumFactory.newConstructor(ChecksumFactory.java:65) at org.apache.hadoop.hbase.util.ChecksumType$3.initialize(ChecksumType.java:113) at org.apache.hadoop.hbase.util.ChecksumType.<init>(ChecksumType.java:148) at org.apache.hadoop.hbase.util.ChecksumType.<init>(ChecksumType.java:37) at org.apache.hadoop.hbase.util.ChecksumType$3.<init>(ChecksumType.java:100) at org.apache.hadoop.hbase.util.ChecksumType.<clinit>(ChecksumType.java:100) at org.apache.hadoop.hbase.io.hfile.HFile.<clinit>(HFile.java:163) at org.apache.hadoop.hbase.regionserver.StoreFile$Reader.<init>(StoreFile.java:1252) at org.apache.hadoop.hbase.regionserver.StoreFile.open(StoreFile.java:516) at org.apache.hadoop.hbase.regionserver.StoreFile.createReader(StoreFile.java:606) at org.apache.hadoop.hbase.regionserver.Store$1.call(Store.java:375) at org.apache.hadoop.hbase.regionserver.Store$1.call(Store.java:370) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) at java.util.concurrent.FutureTask.run(FutureTask.java:138) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) at java.util.concurrent.FutureTask.run(FutureTask.java:138) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang. Thread .run( Thread .java:662) Caused by: java.lang.ClassNotFoundException: org.apache.hadoop.util.PureJavaCrc32C at java.net.URLClassLoader$1.run(URLClassLoader.java:202) at java.security.AccessController.doPrivileged(Native Method) at java.net.URLClassLoader.findClass(URLClassLoader.java:190) at java.lang. ClassLoader .loadClass( ClassLoader .java:306) at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:301) at java.lang. ClassLoader .loadClass( ClassLoader .java:247) at java.lang. Class .forName0(Native Method) at java.lang. Class .forName( Class .java:247) at org.apache.hadoop.hbase.util.ChecksumFactory.getClassByName(ChecksumFactory.java:97) at org.apache.hadoop.hbase.util.ChecksumFactory.newConstructor(ChecksumFactory.java:60) ... 19 more I'm not sure on whats happening. It would seem we're using default CRC32 but then I'm not sure how I get the above exception reading code. Also, not sure if I have this facility turned on. Its on by default but I don't see anything in logs saying its on (and I don't have metrics on this cluster, nor do I have a good handle on before and after regards whether this feature makes a difference). I caught this in a heap dump: "IPC Server handler 0 on 7003" daemon prio=10 tid=0x00007f4a1410c800 nid=0x24b2 runnable [0x00007f4a20487000] java.lang. Thread .State: RUNNABLE at java.util.zip.CRC32.updateBytes(Native Method) at java.util.zip.CRC32.update(CRC32.java:45) at org.apache.hadoop.util.DataChecksum.update(DataChecksum.java:223) at org.apache.hadoop.fs.FSInputChecker.readChecksumChunk(FSInputChecker.java:240) at org.apache.hadoop.fs.FSInputChecker.read1(FSInputChecker.java:189) at org.apache.hadoop.fs.FSInputChecker.read(FSInputChecker.java:158) - locked <0x00000006fc68e9d8> (a org.apache.hadoop.hdfs.BlockReaderLocal) at org.apache.hadoop.hdfs.DFSClient$BlockReader.read(DFSClient.java:1457) - locked <0x00000006fc68e9d8> (a org.apache.hadoop.hdfs.BlockReaderLocal) at org.apache.hadoop.hdfs.BlockReaderLocal.read(BlockReaderLocal.java:326) - locked <0x00000006fc68e9d8> (a org.apache.hadoop.hdfs.BlockReaderLocal) at org.apache.hadoop.fs.FSInputChecker.readFully(FSInputChecker.java:384) at org.apache.hadoop.hdfs.DFSClient$BlockReader.readAll(DFSClient.java:1760) at org.apache.hadoop.hdfs.DFSClient$DFSInputStream.fetchBlockByteRange(DFSClient.java:2330) at org.apache.hadoop.hdfs.DFSClient$DFSInputStream.read(DFSClient.java:2397) at org.apache.hadoop.fs.FSDataInputStream.read(FSDataInputStream.java:46) at org.apache.hadoop.hbase.io.hfile.HFileBlock$AbstractFSReader.readAtOffset(HFileBlock.java:1333) at org.apache.hadoop.hbase.io.hfile.HFileBlock$FSReaderV2.readBlockDataInternal(HFileBlock.java:1769) at org.apache.hadoop.hbase.io.hfile.HFileBlock$FSReaderV2.readBlockData(HFileBlock.java:1633) at org.apache.hadoop.hbase.io.hfile.HFileReaderV2.readBlock(HFileReaderV2.java:328) at org.apache.hadoop.hbase.io.hfile.HFileBlockIndex$BlockIndexReader.seekToDataBlock(HFileBlockIndex.java:213) at org.apache.hadoop.hbase.io.hfile.HFileReaderV2$AbstractScannerV2.seekTo(HFileReaderV2.java:462) at org.apache.hadoop.hbase.io.hfile.HFileReaderV2$AbstractScannerV2.seekTo(HFileReaderV2.java:482) at org.apache.hadoop.hbase.regionserver.StoreFileScanner.seekAtOrAfter(StoreFileScanner.java:226) at org.apache.hadoop.hbase.regionserver.StoreFileScanner.seek(StoreFileScanner.java:145) at org.apache.hadoop.hbase.regionserver.StoreFileScanner.enforceSeek(StoreFileScanner.java:351) at org.apache.hadoop.hbase.regionserver.KeyValueHeap.pollRealKV(KeyValueHeap.java:333) at org.apache.hadoop.hbase.regionserver.KeyValueHeap.generalizedSeek(KeyValueHeap.java:291) at org.apache.hadoop.hbase.regionserver.KeyValueHeap.requestSeek(KeyValueHeap.java:256) at org.apache.hadoop.hbase.regionserver.StoreScanner.reseek(StoreScanner.java:518) - locked <0x00000006fc67cd70> (a org.apache.hadoop.hbase.regionserver.StoreScanner) at org.apache.hadoop.hbase.regionserver.StoreScanner.next(StoreScanner.java:401) - locked <0x00000006fc67cd70> (a org.apache.hadoop.hbase.regionserver.StoreScanner) at org.apache.hadoop.hbase.regionserver.KeyValueHeap.next(KeyValueHeap.java:127) at org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.nextInternal(HRegion.java:3388) at org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.next(HRegion.java:3344) - locked <0x00000006fc67cc50> (a org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl) at org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.next(HRegion.java:3361) - locked <0x00000006fc67cc50> (a org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl) at org.apache.hadoop.hbase.regionserver.HRegion.get(HRegion.java:4145) at org.apache.hadoop.hbase.regionserver.HRegion.get(HRegion.java:4035) at org.apache.hadoop.hbase.regionserver.HRegionServer.get(HRegionServer.java:1957) at sun.reflect.GeneratedMethodAccessor23.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.hbase.ipc.WritableRpcEngine$Server.call(WritableRpcEngine.java:364) at org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:1344) Maybe its not on? Thanks Dhruba.
          Hide
          Ted Yu added a comment -

          The exception about org.apache.hadoop.util.PureJavaCrc32C not found should be normal - it was WARN.
          It was produced by ChecksumType ctor for this:

            CRC32C((byte)2) {
          

          Metrics should be collected on the cluster to see the difference.

          Show
          Ted Yu added a comment - The exception about org.apache.hadoop.util.PureJavaCrc32C not found should be normal - it was WARN. It was produced by ChecksumType ctor for this: CRC32C(( byte )2) { Metrics should be collected on the cluster to see the difference.
          Hide
          stack added a comment -

          Hey Ted. Comment was not for you, it was for the patch author.

          The exception about org.apache.hadoop.util.PureJavaCrc32C not found should be normal - it was WARN.

          The above makes no sense. You have WARN and 'normal' in the same sentence.

          If you look at the log, it says:

          1. 2012-02-27 23:34:20,930 INFO org.apache.hadoop.hbase.util.ChecksumType: org.apache.hadoop.util.PureJavaCrc32 not available.
          2. 2012-02-27 23:34:20,930 INFO org.apache.hadoop.hbase.util.ChecksumType: Checksum using java.util.zip.CRC32
          3. It spews a thread dump saying AGAIN that org.apache.hadoop.util.PureJavaCrc32C not available.

          That is going to confuse.

          Metrics should be collected on the cluster to see the difference.

          Go easy on telling folks what they should do. It tends to piss them off.

          Show
          stack added a comment - Hey Ted. Comment was not for you, it was for the patch author. The exception about org.apache.hadoop.util.PureJavaCrc32C not found should be normal - it was WARN. The above makes no sense. You have WARN and 'normal' in the same sentence. If you look at the log, it says: 1. 2012-02-27 23:34:20,930 INFO org.apache.hadoop.hbase.util.ChecksumType: org.apache.hadoop.util.PureJavaCrc32 not available. 2. 2012-02-27 23:34:20,930 INFO org.apache.hadoop.hbase.util.ChecksumType: Checksum using java.util.zip.CRC32 3. It spews a thread dump saying AGAIN that org.apache.hadoop.util.PureJavaCrc32C not available. That is going to confuse. Metrics should be collected on the cluster to see the difference. Go easy on telling folks what they should do. It tends to piss them off.
          Hide
          Ted Yu added a comment -

          I wish I had been more prudent before making the previous comments.

          Show
          Ted Yu added a comment - I wish I had been more prudent before making the previous comments.
          Hide
          dhruba borthakur added a comment -

          @Stack: I am pretty sure that the feature is on by default (but let me check and get back to you). Regarding the exception message about CRC32C, the Enum is trying to create this object but failing to do so because the Hadoop library in Hadoop 1.0 does not have support for this one (Hadop 2.0 supports CRC32C). The reason I kept that is because people who might already be experimenting with Hadoop 2.0 will get this support out-of-the-box. But I agree that it will be good to get rid of this exception message at startup. Do you have any suggestions on this one?

          @Todd: will take your excellent suggestion and make the majorVersion inside HFileBlock as a "static". Thanks.

          @Ted: Thanks for your comments. Will try to gather metrics in my cluster and post to this JIRA.

          Show
          dhruba borthakur added a comment - @Stack: I am pretty sure that the feature is on by default (but let me check and get back to you). Regarding the exception message about CRC32C, the Enum is trying to create this object but failing to do so because the Hadoop library in Hadoop 1.0 does not have support for this one (Hadop 2.0 supports CRC32C). The reason I kept that is because people who might already be experimenting with Hadoop 2.0 will get this support out-of-the-box. But I agree that it will be good to get rid of this exception message at startup. Do you have any suggestions on this one? @Todd: will take your excellent suggestion and make the majorVersion inside HFileBlock as a "static". Thanks. @Ted: Thanks for your comments. Will try to gather metrics in my cluster and post to this JIRA.
          Hide
          stack added a comment -

          @Dhruba Its good trying for PureJavaCrc32 first. Get rid of the WARN w/ thread dump I'd say especially as is where it comes after reporting we're not going to use PureJavaCrc32. The feature does seem to be on by default but it would be nice to know it w/o having to go to ganglia graphs to figure my i/o loading to see whether or not this feature is enabled – going to ganglia would be useless anyways in case where I've no history w/ an hbase read load – so some kind of log output might be useful? Good on you D.

          Show
          stack added a comment - @Dhruba Its good trying for PureJavaCrc32 first. Get rid of the WARN w/ thread dump I'd say especially as is where it comes after reporting we're not going to use PureJavaCrc32. The feature does seem to be on by default but it would be nice to know it w/o having to go to ganglia graphs to figure my i/o loading to see whether or not this feature is enabled – going to ganglia would be useless anyways in case where I've no history w/ an hbase read load – so some kind of log output might be useful? Good on you D.
          Hide
          Ted Yu added a comment -
          Show
          Ted Yu added a comment - I first mentioned porting PureJavaCrc32C to HBase here: https://issues.apache.org/jira/browse/HBASE-5074?focusedCommentId=13202490&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13202490 Is that something worth trying ?
          Hide
          Phabricator added a comment -

          dhruba updated the revision "[jira] HBASE-5074 Support checksums in HBase block cache".
          Reviewers: mbautin

          1. I modified the ChecksumType code to not dum an exception stack trace to the output if CRC32C is not
          available. Ted's suggestion of pulling CRC32C into hbase code sounds reasonable, but I would like
          to do it as part of another jira. Also, if hbase moves to hadoop 2.0, then it will automatically
          get CRC32C.
          2. I added a "minorVersion=" to the output of HFilePrettyPrinter.
          Stack, will you be able to run "bin/hbase hfile -m -f filename on your cluster to verify that this
          checksum feature is switched on. If it prints minorVersion=1, then you are using this feature.
          Do you still need a print somewhere saying that this feature in on? The older files that were
          pre-created before that patch was deployed will still use hdfs-checksum verification, so you
          could possible see hdfs-checksum-verification on stack traces on a live regionserver.
          3. I did some thinking (again) on the semantics of major version and minor version. The major version
          represents a new file format, e.g. suppose we add a new thing to the file's triailer, then we
          might need to bump up the major version. The minor version indicates the format of data inside a
          HFileBlock.
          In the current code, major versions 1 and 2 share the same HFileFormat (indicated by minor version
          of 0). In this patch, we have a new minorVersion 1 because the data contents inside a HFileBlock
          has changed. Tecnically, both major version 1 and 2 could have either minorVerion 0 or 1.
          Now, suppose we want to add a new field to the trailer of the HFile. We can bump the majorVersion
          to 3 but do not change the minorVersion because we did not change the internal format of an
          HFileBlock.
          Given the above, does it make sense to say that HFileBlock is independent of the majorVersion?

          REVISION DETAIL
          https://reviews.facebook.net/D1521

          AFFECTED FILES
          src/test/java/org/apache/hadoop/hbase/HBaseTestingUtility.java
          src/test/java/org/apache/hadoop/hbase/regionserver/CreateRandomStoreFile.java
          src/test/java/org/apache/hadoop/hbase/regionserver/handler/TestCloseRegionHandler.java
          src/test/java/org/apache/hadoop/hbase/regionserver/TestFSErrorsExposed.java
          src/test/java/org/apache/hadoop/hbase/regionserver/HFileReadWriteTest.java
          src/test/java/org/apache/hadoop/hbase/regionserver/TestCompoundBloomFilter.java
          src/test/java/org/apache/hadoop/hbase/regionserver/TestHRegion.java
          src/test/java/org/apache/hadoop/hbase/regionserver/TestStoreFile.java
          src/test/java/org/apache/hadoop/hbase/io/hfile/TestChecksum.java
          src/test/java/org/apache/hadoop/hbase/io/hfile/TestHFileBlock.java
          src/test/java/org/apache/hadoop/hbase/io/hfile/TestCacheOnWrite.java
          src/test/java/org/apache/hadoop/hbase/io/hfile/TestHFileReaderV1.java
          src/test/java/org/apache/hadoop/hbase/io/hfile/TestFixedFileTrailer.java
          src/test/java/org/apache/hadoop/hbase/io/hfile/CacheTestUtils.java
          src/test/java/org/apache/hadoop/hbase/io/hfile/TestHFileBlockIndex.java
          src/test/java/org/apache/hadoop/hbase/io/hfile/TestHFileDataBlockEncoder.java
          src/test/java/org/apache/hadoop/hbase/io/hfile/TestHFileWriterV2.java
          src/test/java/org/apache/hadoop/hbase/io/hfile/TestHFileBlockCompatibility.java
          src/test/java/org/apache/hadoop/hbase/util/MockRegionServerServices.java
          src/main/java/org/apache/hadoop/hbase/HConstants.java
          src/main/java/org/apache/hadoop/hbase/fs
          src/main/java/org/apache/hadoop/hbase/fs/HFileSystem.java
          src/main/java/org/apache/hadoop/hbase/util/ChecksumType.java
          src/main/java/org/apache/hadoop/hbase/util/CompoundBloomFilter.java
          src/main/java/org/apache/hadoop/hbase/util/ChecksumFactory.java
          src/main/java/org/apache/hadoop/hbase/regionserver/RegionServerServices.java
          src/main/java/org/apache/hadoop/hbase/regionserver/metrics/RegionServerMetrics.java
          src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java
          src/main/java/org/apache/hadoop/hbase/regionserver/Store.java
          src/main/java/org/apache/hadoop/hbase/regionserver/StoreFile.java
          src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServer.java
          src/main/java/org/apache/hadoop/hbase/mapreduce/LoadIncrementalHFiles.java
          src/main/java/org/apache/hadoop/hbase/io/hfile/ChecksumUtil.java
          src/main/java/org/apache/hadoop/hbase/io/hfile/HFileBlock.java
          src/main/java/org/apache/hadoop/hbase/io/hfile/HFileDataBlockEncoderImpl.java
          src/main/java/org/apache/hadoop/hbase/io/hfile/HFileReaderV2.java
          src/main/java/org/apache/hadoop/hbase/io/hfile/HFile.java
          src/main/java/org/apache/hadoop/hbase/io/hfile/HFileWriterV2.java
          src/main/java/org/apache/hadoop/hbase/io/hfile/HFileDataBlockEncoder.java
          src/main/java/org/apache/hadoop/hbase/io/hfile/AbstractHFileReader.java
          src/main/java/org/apache/hadoop/hbase/io/hfile/NoOpDataBlockEncoder.java
          src/main/java/org/apache/hadoop/hbase/io/hfile/HFileReaderV1.java
          src/main/java/org/apache/hadoop/hbase/io/hfile/FixedFileTrailer.java
          src/main/java/org/apache/hadoop/hbase/io/hfile/HFileWriterV1.java

          Show
          Phabricator added a comment - dhruba updated the revision " [jira] HBASE-5074 Support checksums in HBase block cache". Reviewers: mbautin 1. I modified the ChecksumType code to not dum an exception stack trace to the output if CRC32C is not available. Ted's suggestion of pulling CRC32C into hbase code sounds reasonable, but I would like to do it as part of another jira. Also, if hbase moves to hadoop 2.0, then it will automatically get CRC32C. 2. I added a "minorVersion=" to the output of HFilePrettyPrinter. Stack, will you be able to run "bin/hbase hfile -m -f filename on your cluster to verify that this checksum feature is switched on. If it prints minorVersion=1, then you are using this feature. Do you still need a print somewhere saying that this feature in on? The older files that were pre-created before that patch was deployed will still use hdfs-checksum verification, so you could possible see hdfs-checksum-verification on stack traces on a live regionserver. 3. I did some thinking (again) on the semantics of major version and minor version. The major version represents a new file format, e.g. suppose we add a new thing to the file's triailer, then we might need to bump up the major version. The minor version indicates the format of data inside a HFileBlock. In the current code, major versions 1 and 2 share the same HFileFormat (indicated by minor version of 0). In this patch, we have a new minorVersion 1 because the data contents inside a HFileBlock has changed. Tecnically, both major version 1 and 2 could have either minorVerion 0 or 1. Now, suppose we want to add a new field to the trailer of the HFile. We can bump the majorVersion to 3 but do not change the minorVersion because we did not change the internal format of an HFileBlock. Given the above, does it make sense to say that HFileBlock is independent of the majorVersion? REVISION DETAIL https://reviews.facebook.net/D1521 AFFECTED FILES src/test/java/org/apache/hadoop/hbase/HBaseTestingUtility.java src/test/java/org/apache/hadoop/hbase/regionserver/CreateRandomStoreFile.java src/test/java/org/apache/hadoop/hbase/regionserver/handler/TestCloseRegionHandler.java src/test/java/org/apache/hadoop/hbase/regionserver/TestFSErrorsExposed.java src/test/java/org/apache/hadoop/hbase/regionserver/HFileReadWriteTest.java src/test/java/org/apache/hadoop/hbase/regionserver/TestCompoundBloomFilter.java src/test/java/org/apache/hadoop/hbase/regionserver/TestHRegion.java src/test/java/org/apache/hadoop/hbase/regionserver/TestStoreFile.java src/test/java/org/apache/hadoop/hbase/io/hfile/TestChecksum.java src/test/java/org/apache/hadoop/hbase/io/hfile/TestHFileBlock.java src/test/java/org/apache/hadoop/hbase/io/hfile/TestCacheOnWrite.java src/test/java/org/apache/hadoop/hbase/io/hfile/TestHFileReaderV1.java src/test/java/org/apache/hadoop/hbase/io/hfile/TestFixedFileTrailer.java src/test/java/org/apache/hadoop/hbase/io/hfile/CacheTestUtils.java src/test/java/org/apache/hadoop/hbase/io/hfile/TestHFileBlockIndex.java src/test/java/org/apache/hadoop/hbase/io/hfile/TestHFileDataBlockEncoder.java src/test/java/org/apache/hadoop/hbase/io/hfile/TestHFileWriterV2.java src/test/java/org/apache/hadoop/hbase/io/hfile/TestHFileBlockCompatibility.java src/test/java/org/apache/hadoop/hbase/util/MockRegionServerServices.java src/main/java/org/apache/hadoop/hbase/HConstants.java src/main/java/org/apache/hadoop/hbase/fs src/main/java/org/apache/hadoop/hbase/fs/HFileSystem.java src/main/java/org/apache/hadoop/hbase/util/ChecksumType.java src/main/java/org/apache/hadoop/hbase/util/CompoundBloomFilter.java src/main/java/org/apache/hadoop/hbase/util/ChecksumFactory.java src/main/java/org/apache/hadoop/hbase/regionserver/RegionServerServices.java src/main/java/org/apache/hadoop/hbase/regionserver/metrics/RegionServerMetrics.java src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java src/main/java/org/apache/hadoop/hbase/regionserver/Store.java src/main/java/org/apache/hadoop/hbase/regionserver/StoreFile.java src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServer.java src/main/java/org/apache/hadoop/hbase/mapreduce/LoadIncrementalHFiles.java src/main/java/org/apache/hadoop/hbase/io/hfile/ChecksumUtil.java src/main/java/org/apache/hadoop/hbase/io/hfile/HFileBlock.java src/main/java/org/apache/hadoop/hbase/io/hfile/HFileDataBlockEncoderImpl.java src/main/java/org/apache/hadoop/hbase/io/hfile/HFileReaderV2.java src/main/java/org/apache/hadoop/hbase/io/hfile/HFile.java src/main/java/org/apache/hadoop/hbase/io/hfile/HFileWriterV2.java src/main/java/org/apache/hadoop/hbase/io/hfile/HFileDataBlockEncoder.java src/main/java/org/apache/hadoop/hbase/io/hfile/AbstractHFileReader.java src/main/java/org/apache/hadoop/hbase/io/hfile/NoOpDataBlockEncoder.java src/main/java/org/apache/hadoop/hbase/io/hfile/HFileReaderV1.java src/main/java/org/apache/hadoop/hbase/io/hfile/FixedFileTrailer.java src/main/java/org/apache/hadoop/hbase/io/hfile/HFileWriterV1.java
          Hide
          Phabricator added a comment -

          dhruba updated the revision "[jira] HBASE-5074 Support checksums in HBase block cache".
          Reviewers: mbautin

          1. I modified the ChecksumType code to not dum an exception stack trace to the output if CRC32C is not
          available. Ted's suggestion of pulling CRC32C into hbase code sounds reasonable, but I would like
          to do it as part of another jira. Also, if hbase moves to hadoop 2.0, then it will automatically
          get CRC32C.
          2. I added a "minorVersion=" to the output of HFilePrettyPrinter.
          Stack, will you be able to run "bin/hbase hfile -m -f filename on your cluster to verify that this
          checksum feature is switched on. If it prints minorVersion=1, then you are using this feature.
          Do you still need a print somewhere saying that this feature in on? The older files that were
          pre-created before that patch was deployed will still use hdfs-checksum verification, so you
          could possible see hdfs-checksum-verification on stack traces on a live regionserver.
          3. I did some thinking (again) on the semantics of major version and minor version. The major version
          represents a new file format, e.g. suppose we add a new thing to the file's triailer, then we
          might need to bump up the major version. The minor version indicates the format of data inside a
          HFileBlock.
          In the current code, major versions 1 and 2 share the same HFileFormat (indicated by minor version
          of 0). In this patch, we have a new minorVersion 1 because the data contents inside a HFileBlock
          has changed. Tecnically, both major version 1 and 2 could have either minorVerion 0 or 1.
          Now, suppose we want to add a new field to the trailer of the HFile. We can bump the majorVersion
          to 3 but do not change the minorVersion because we did not change the internal format of an
          HFileBlock.
          Given the above, does it make sense to say that HFileBlock is independent of the majorVersion?

          REVISION DETAIL
          https://reviews.facebook.net/D1521

          AFFECTED FILES
          src/test/java/org/apache/hadoop/hbase/HBaseTestingUtility.java
          src/test/java/org/apache/hadoop/hbase/regionserver/CreateRandomStoreFile.java
          src/test/java/org/apache/hadoop/hbase/regionserver/handler/TestCloseRegionHandler.java
          src/test/java/org/apache/hadoop/hbase/regionserver/TestFSErrorsExposed.java
          src/test/java/org/apache/hadoop/hbase/regionserver/HFileReadWriteTest.java
          src/test/java/org/apache/hadoop/hbase/regionserver/TestCompoundBloomFilter.java
          src/test/java/org/apache/hadoop/hbase/regionserver/TestHRegion.java
          src/test/java/org/apache/hadoop/hbase/regionserver/TestStoreFile.java
          src/test/java/org/apache/hadoop/hbase/io/hfile/TestChecksum.java
          src/test/java/org/apache/hadoop/hbase/io/hfile/TestHFileBlock.java
          src/test/java/org/apache/hadoop/hbase/io/hfile/TestCacheOnWrite.java
          src/test/java/org/apache/hadoop/hbase/io/hfile/TestHFileReaderV1.java
          src/test/java/org/apache/hadoop/hbase/io/hfile/TestFixedFileTrailer.java
          src/test/java/org/apache/hadoop/hbase/io/hfile/CacheTestUtils.java
          src/test/java/org/apache/hadoop/hbase/io/hfile/TestHFileBlockIndex.java
          src/test/java/org/apache/hadoop/hbase/io/hfile/TestHFileDataBlockEncoder.java
          src/test/java/org/apache/hadoop/hbase/io/hfile/TestHFileWriterV2.java
          src/test/java/org/apache/hadoop/hbase/io/hfile/TestHFileBlockCompatibility.java
          src/test/java/org/apache/hadoop/hbase/util/MockRegionServerServices.java
          src/main/java/org/apache/hadoop/hbase/HConstants.java
          src/main/java/org/apache/hadoop/hbase/fs
          src/main/java/org/apache/hadoop/hbase/fs/HFileSystem.java
          src/main/java/org/apache/hadoop/hbase/util/ChecksumType.java
          src/main/java/org/apache/hadoop/hbase/util/CompoundBloomFilter.java
          src/main/java/org/apache/hadoop/hbase/util/ChecksumFactory.java
          src/main/java/org/apache/hadoop/hbase/regionserver/RegionServerServices.java
          src/main/java/org/apache/hadoop/hbase/regionserver/metrics/RegionServerMetrics.java
          src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java
          src/main/java/org/apache/hadoop/hbase/regionserver/Store.java
          src/main/java/org/apache/hadoop/hbase/regionserver/StoreFile.java
          src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServer.java
          src/main/java/org/apache/hadoop/hbase/mapreduce/LoadIncrementalHFiles.java
          src/main/java/org/apache/hadoop/hbase/io/hfile/ChecksumUtil.java
          src/main/java/org/apache/hadoop/hbase/io/hfile/HFileBlock.java
          src/main/java/org/apache/hadoop/hbase/io/hfile/HFileDataBlockEncoderImpl.java
          src/main/java/org/apache/hadoop/hbase/io/hfile/HFileReaderV2.java
          src/main/java/org/apache/hadoop/hbase/io/hfile/HFile.java
          src/main/java/org/apache/hadoop/hbase/io/hfile/HFileWriterV2.java
          src/main/java/org/apache/hadoop/hbase/io/hfile/HFileDataBlockEncoder.java
          src/main/java/org/apache/hadoop/hbase/io/hfile/AbstractHFileReader.java
          src/main/java/org/apache/hadoop/hbase/io/hfile/NoOpDataBlockEncoder.java
          src/main/java/org/apache/hadoop/hbase/io/hfile/HFileReaderV1.java
          src/main/java/org/apache/hadoop/hbase/io/hfile/FixedFileTrailer.java
          src/main/java/org/apache/hadoop/hbase/io/hfile/HFileWriterV1.java

          Show
          Phabricator added a comment - dhruba updated the revision " [jira] HBASE-5074 Support checksums in HBase block cache". Reviewers: mbautin 1. I modified the ChecksumType code to not dum an exception stack trace to the output if CRC32C is not available. Ted's suggestion of pulling CRC32C into hbase code sounds reasonable, but I would like to do it as part of another jira. Also, if hbase moves to hadoop 2.0, then it will automatically get CRC32C. 2. I added a "minorVersion=" to the output of HFilePrettyPrinter. Stack, will you be able to run "bin/hbase hfile -m -f filename on your cluster to verify that this checksum feature is switched on. If it prints minorVersion=1, then you are using this feature. Do you still need a print somewhere saying that this feature in on? The older files that were pre-created before that patch was deployed will still use hdfs-checksum verification, so you could possible see hdfs-checksum-verification on stack traces on a live regionserver. 3. I did some thinking (again) on the semantics of major version and minor version. The major version represents a new file format, e.g. suppose we add a new thing to the file's triailer, then we might need to bump up the major version. The minor version indicates the format of data inside a HFileBlock. In the current code, major versions 1 and 2 share the same HFileFormat (indicated by minor version of 0). In this patch, we have a new minorVersion 1 because the data contents inside a HFileBlock has changed. Tecnically, both major version 1 and 2 could have either minorVerion 0 or 1. Now, suppose we want to add a new field to the trailer of the HFile. We can bump the majorVersion to 3 but do not change the minorVersion because we did not change the internal format of an HFileBlock. Given the above, does it make sense to say that HFileBlock is independent of the majorVersion? REVISION DETAIL https://reviews.facebook.net/D1521 AFFECTED FILES src/test/java/org/apache/hadoop/hbase/HBaseTestingUtility.java src/test/java/org/apache/hadoop/hbase/regionserver/CreateRandomStoreFile.java src/test/java/org/apache/hadoop/hbase/regionserver/handler/TestCloseRegionHandler.java src/test/java/org/apache/hadoop/hbase/regionserver/TestFSErrorsExposed.java src/test/java/org/apache/hadoop/hbase/regionserver/HFileReadWriteTest.java src/test/java/org/apache/hadoop/hbase/regionserver/TestCompoundBloomFilter.java src/test/java/org/apache/hadoop/hbase/regionserver/TestHRegion.java src/test/java/org/apache/hadoop/hbase/regionserver/TestStoreFile.java src/test/java/org/apache/hadoop/hbase/io/hfile/TestChecksum.java src/test/java/org/apache/hadoop/hbase/io/hfile/TestHFileBlock.java src/test/java/org/apache/hadoop/hbase/io/hfile/TestCacheOnWrite.java src/test/java/org/apache/hadoop/hbase/io/hfile/TestHFileReaderV1.java src/test/java/org/apache/hadoop/hbase/io/hfile/TestFixedFileTrailer.java src/test/java/org/apache/hadoop/hbase/io/hfile/CacheTestUtils.java src/test/java/org/apache/hadoop/hbase/io/hfile/TestHFileBlockIndex.java src/test/java/org/apache/hadoop/hbase/io/hfile/TestHFileDataBlockEncoder.java src/test/java/org/apache/hadoop/hbase/io/hfile/TestHFileWriterV2.java src/test/java/org/apache/hadoop/hbase/io/hfile/TestHFileBlockCompatibility.java src/test/java/org/apache/hadoop/hbase/util/MockRegionServerServices.java src/main/java/org/apache/hadoop/hbase/HConstants.java src/main/java/org/apache/hadoop/hbase/fs src/main/java/org/apache/hadoop/hbase/fs/HFileSystem.java src/main/java/org/apache/hadoop/hbase/util/ChecksumType.java src/main/java/org/apache/hadoop/hbase/util/CompoundBloomFilter.java src/main/java/org/apache/hadoop/hbase/util/ChecksumFactory.java src/main/java/org/apache/hadoop/hbase/regionserver/RegionServerServices.java src/main/java/org/apache/hadoop/hbase/regionserver/metrics/RegionServerMetrics.java src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java src/main/java/org/apache/hadoop/hbase/regionserver/Store.java src/main/java/org/apache/hadoop/hbase/regionserver/StoreFile.java src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServer.java src/main/java/org/apache/hadoop/hbase/mapreduce/LoadIncrementalHFiles.java src/main/java/org/apache/hadoop/hbase/io/hfile/ChecksumUtil.java src/main/java/org/apache/hadoop/hbase/io/hfile/HFileBlock.java src/main/java/org/apache/hadoop/hbase/io/hfile/HFileDataBlockEncoderImpl.java src/main/java/org/apache/hadoop/hbase/io/hfile/HFileReaderV2.java src/main/java/org/apache/hadoop/hbase/io/hfile/HFile.java src/main/java/org/apache/hadoop/hbase/io/hfile/HFileWriterV2.java src/main/java/org/apache/hadoop/hbase/io/hfile/HFileDataBlockEncoder.java src/main/java/org/apache/hadoop/hbase/io/hfile/AbstractHFileReader.java src/main/java/org/apache/hadoop/hbase/io/hfile/NoOpDataBlockEncoder.java src/main/java/org/apache/hadoop/hbase/io/hfile/HFileReaderV1.java src/main/java/org/apache/hadoop/hbase/io/hfile/FixedFileTrailer.java src/main/java/org/apache/hadoop/hbase/io/hfile/HFileWriterV1.java
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12516798/D1521.11.patch
          against trunk revision .

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 55 new or modified tests.

          -1 javadoc. The javadoc tool appears to have generated -125 warning messages.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          -1 findbugs. The patch appears to introduce 159 new Findbugs (version 1.3.9) warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          -1 core tests. The patch failed these unit tests:
          org.apache.hadoop.hbase.io.hfile.TestFixedFileTrailer
          org.apache.hadoop.hbase.mapreduce.TestHFileOutputFormat
          org.apache.hadoop.hbase.mapred.TestTableMapReduce
          org.apache.hadoop.hbase.mapreduce.TestImportTsv

          Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/1079//testReport/
          Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/1079//artifact/trunk/patchprocess/newPatchFindbugsWarnings.html
          Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/1079//console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12516798/D1521.11.patch against trunk revision . +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 55 new or modified tests. -1 javadoc. The javadoc tool appears to have generated -125 warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. -1 findbugs. The patch appears to introduce 159 new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed these unit tests: org.apache.hadoop.hbase.io.hfile.TestFixedFileTrailer org.apache.hadoop.hbase.mapreduce.TestHFileOutputFormat org.apache.hadoop.hbase.mapred.TestTableMapReduce org.apache.hadoop.hbase.mapreduce.TestImportTsv Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/1079//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/1079//artifact/trunk/patchprocess/newPatchFindbugsWarnings.html Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/1079//console This message is automatically generated.
          Hide
          Phabricator added a comment -

          dhruba updated the revision "[jira] HBASE-5074 Support checksums in HBase block cache".
          Reviewers: mbautin

          Fixed failed unit test TestFixedFileTrailer

          REVISION DETAIL
          https://reviews.facebook.net/D1521

          AFFECTED FILES
          src/test/java/org/apache/hadoop/hbase/HBaseTestingUtility.java
          src/test/java/org/apache/hadoop/hbase/regionserver/CreateRandomStoreFile.java
          src/test/java/org/apache/hadoop/hbase/regionserver/handler/TestCloseRegionHandler.java
          src/test/java/org/apache/hadoop/hbase/regionserver/TestFSErrorsExposed.java
          src/test/java/org/apache/hadoop/hbase/regionserver/HFileReadWriteTest.java
          src/test/java/org/apache/hadoop/hbase/regionserver/TestCompoundBloomFilter.java
          src/test/java/org/apache/hadoop/hbase/regionserver/TestHRegion.java
          src/test/java/org/apache/hadoop/hbase/regionserver/TestStoreFile.java
          src/test/java/org/apache/hadoop/hbase/io/hfile/TestChecksum.java
          src/test/java/org/apache/hadoop/hbase/io/hfile/TestHFileBlock.java
          src/test/java/org/apache/hadoop/hbase/io/hfile/TestCacheOnWrite.java
          src/test/java/org/apache/hadoop/hbase/io/hfile/TestHFileReaderV1.java
          src/test/java/org/apache/hadoop/hbase/io/hfile/TestFixedFileTrailer.java
          src/test/java/org/apache/hadoop/hbase/io/hfile/CacheTestUtils.java
          src/test/java/org/apache/hadoop/hbase/io/hfile/TestHFileBlockIndex.java
          src/test/java/org/apache/hadoop/hbase/io/hfile/TestHFileDataBlockEncoder.java
          src/test/java/org/apache/hadoop/hbase/io/hfile/TestHFileWriterV2.java
          src/test/java/org/apache/hadoop/hbase/io/hfile/TestHFileBlockCompatibility.java
          src/test/java/org/apache/hadoop/hbase/util/MockRegionServerServices.java
          src/main/java/org/apache/hadoop/hbase/HConstants.java
          src/main/java/org/apache/hadoop/hbase/fs
          src/main/java/org/apache/hadoop/hbase/fs/HFileSystem.java
          src/main/java/org/apache/hadoop/hbase/util/ChecksumType.java
          src/main/java/org/apache/hadoop/hbase/util/CompoundBloomFilter.java
          src/main/java/org/apache/hadoop/hbase/util/ChecksumFactory.java
          src/main/java/org/apache/hadoop/hbase/regionserver/RegionServerServices.java
          src/main/java/org/apache/hadoop/hbase/regionserver/metrics/RegionServerMetrics.java
          src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java
          src/main/java/org/apache/hadoop/hbase/regionserver/Store.java
          src/main/java/org/apache/hadoop/hbase/regionserver/StoreFile.java
          src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServer.java
          src/main/java/org/apache/hadoop/hbase/mapreduce/LoadIncrementalHFiles.java
          src/main/java/org/apache/hadoop/hbase/io/hfile/ChecksumUtil.java
          src/main/java/org/apache/hadoop/hbase/io/hfile/HFileBlock.java
          src/main/java/org/apache/hadoop/hbase/io/hfile/HFileDataBlockEncoderImpl.java
          src/main/java/org/apache/hadoop/hbase/io/hfile/HFileReaderV2.java
          src/main/java/org/apache/hadoop/hbase/io/hfile/HFile.java
          src/main/java/org/apache/hadoop/hbase/io/hfile/HFileWriterV2.java
          src/main/java/org/apache/hadoop/hbase/io/hfile/HFileDataBlockEncoder.java
          src/main/java/org/apache/hadoop/hbase/io/hfile/AbstractHFileReader.java
          src/main/java/org/apache/hadoop/hbase/io/hfile/NoOpDataBlockEncoder.java
          src/main/java/org/apache/hadoop/hbase/io/hfile/HFileReaderV1.java
          src/main/java/org/apache/hadoop/hbase/io/hfile/FixedFileTrailer.java
          src/main/java/org/apache/hadoop/hbase/io/hfile/HFileWriterV1.java

          Show
          Phabricator added a comment - dhruba updated the revision " [jira] HBASE-5074 Support checksums in HBase block cache". Reviewers: mbautin Fixed failed unit test TestFixedFileTrailer REVISION DETAIL https://reviews.facebook.net/D1521 AFFECTED FILES src/test/java/org/apache/hadoop/hbase/HBaseTestingUtility.java src/test/java/org/apache/hadoop/hbase/regionserver/CreateRandomStoreFile.java src/test/java/org/apache/hadoop/hbase/regionserver/handler/TestCloseRegionHandler.java src/test/java/org/apache/hadoop/hbase/regionserver/TestFSErrorsExposed.java src/test/java/org/apache/hadoop/hbase/regionserver/HFileReadWriteTest.java src/test/java/org/apache/hadoop/hbase/regionserver/TestCompoundBloomFilter.java src/test/java/org/apache/hadoop/hbase/regionserver/TestHRegion.java src/test/java/org/apache/hadoop/hbase/regionserver/TestStoreFile.java src/test/java/org/apache/hadoop/hbase/io/hfile/TestChecksum.java src/test/java/org/apache/hadoop/hbase/io/hfile/TestHFileBlock.java src/test/java/org/apache/hadoop/hbase/io/hfile/TestCacheOnWrite.java src/test/java/org/apache/hadoop/hbase/io/hfile/TestHFileReaderV1.java src/test/java/org/apache/hadoop/hbase/io/hfile/TestFixedFileTrailer.java src/test/java/org/apache/hadoop/hbase/io/hfile/CacheTestUtils.java src/test/java/org/apache/hadoop/hbase/io/hfile/TestHFileBlockIndex.java src/test/java/org/apache/hadoop/hbase/io/hfile/TestHFileDataBlockEncoder.java src/test/java/org/apache/hadoop/hbase/io/hfile/TestHFileWriterV2.java src/test/java/org/apache/hadoop/hbase/io/hfile/TestHFileBlockCompatibility.java src/test/java/org/apache/hadoop/hbase/util/MockRegionServerServices.java src/main/java/org/apache/hadoop/hbase/HConstants.java src/main/java/org/apache/hadoop/hbase/fs src/main/java/org/apache/hadoop/hbase/fs/HFileSystem.java src/main/java/org/apache/hadoop/hbase/util/ChecksumType.java src/main/java/org/apache/hadoop/hbase/util/CompoundBloomFilter.java src/main/java/org/apache/hadoop/hbase/util/ChecksumFactory.java src/main/java/org/apache/hadoop/hbase/regionserver/RegionServerServices.java src/main/java/org/apache/hadoop/hbase/regionserver/metrics/RegionServerMetrics.java src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java src/main/java/org/apache/hadoop/hbase/regionserver/Store.java src/main/java/org/apache/hadoop/hbase/regionserver/StoreFile.java src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServer.java src/main/java/org/apache/hadoop/hbase/mapreduce/LoadIncrementalHFiles.java src/main/java/org/apache/hadoop/hbase/io/hfile/ChecksumUtil.java src/main/java/org/apache/hadoop/hbase/io/hfile/HFileBlock.java src/main/java/org/apache/hadoop/hbase/io/hfile/HFileDataBlockEncoderImpl.java src/main/java/org/apache/hadoop/hbase/io/hfile/HFileReaderV2.java src/main/java/org/apache/hadoop/hbase/io/hfile/HFile.java src/main/java/org/apache/hadoop/hbase/io/hfile/HFileWriterV2.java src/main/java/org/apache/hadoop/hbase/io/hfile/HFileDataBlockEncoder.java src/main/java/org/apache/hadoop/hbase/io/hfile/AbstractHFileReader.java src/main/java/org/apache/hadoop/hbase/io/hfile/NoOpDataBlockEncoder.java src/main/java/org/apache/hadoop/hbase/io/hfile/HFileReaderV1.java src/main/java/org/apache/hadoop/hbase/io/hfile/FixedFileTrailer.java src/main/java/org/apache/hadoop/hbase/io/hfile/HFileWriterV1.java
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12516807/D1521.12.patch
          against trunk revision .

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 55 new or modified tests.

          -1 javadoc. The javadoc tool appears to have generated -125 warning messages.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          -1 findbugs. The patch appears to introduce 159 new Findbugs (version 1.3.9) warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          -1 core tests. The patch failed these unit tests:
          org.apache.hadoop.hbase.TestDrainingServer

          Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/1080//testReport/
          Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/1080//artifact/trunk/patchprocess/newPatchFindbugsWarnings.html
          Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/1080//console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12516807/D1521.12.patch against trunk revision . +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 55 new or modified tests. -1 javadoc. The javadoc tool appears to have generated -125 warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. -1 findbugs. The patch appears to introduce 159 new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed these unit tests: org.apache.hadoop.hbase.TestDrainingServer Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/1080//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/1080//artifact/trunk/patchprocess/newPatchFindbugsWarnings.html Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/1080//console This message is automatically generated.
          Hide
          Ted Yu added a comment -

          Adding CRC32C in another JIRA is fine. Hadoop 2.0 isn't released. It would be nice to give users CRC32C early.

          The current formation w.r.t. minor version means that HFileV3 would start with minor version of 1.

          Show
          Ted Yu added a comment - Adding CRC32C in another JIRA is fine. Hadoop 2.0 isn't released. It would be nice to give users CRC32C early. The current formation w.r.t. minor version means that HFileV3 would start with minor version of 1.
          Hide
          Todd Lipcon added a comment -

          There's no benefit to CRC32C over CRC32 unless you can use the JNI code. I don't think copy-pasting all of the JNI stuff into HBase is a good idea. And, besides, this patch is not yet equipped to do the JNI-based checksumming (which requires direct buffers, etc)

          Show
          Todd Lipcon added a comment - There's no benefit to CRC32C over CRC32 unless you can use the JNI code. I don't think copy-pasting all of the JNI stuff into HBase is a good idea. And, besides, this patch is not yet equipped to do the JNI-based checksumming (which requires direct buffers, etc)
          Hide
          dhruba borthakur added a comment -

          The reason I kept the definition of CRC32C in the ChecksumType is essentially to reserve an ordinal in the enum for this checksum algorithm in the future. We should just wait for Hadoop 2.0 to be released to get this feature (instead of copying it to hbase).

          > means that HFileV3 would start with minor version of 1.

          I am suggesting that HFileV3 has nothing to do with minorVersions. HFileV3 can decide to support minor version 0 or 1 or both. HFileV3 might not even use the HFileBlock format as we know it, in which case the question is moot.

          Show
          dhruba borthakur added a comment - The reason I kept the definition of CRC32C in the ChecksumType is essentially to reserve an ordinal in the enum for this checksum algorithm in the future. We should just wait for Hadoop 2.0 to be released to get this feature (instead of copying it to hbase). > means that HFileV3 would start with minor version of 1. I am suggesting that HFileV3 has nothing to do with minorVersions. HFileV3 can decide to support minor version 0 or 1 or both. HFileV3 might not even use the HFileBlock format as we know it, in which case the question is moot.
          Hide
          dhruba borthakur added a comment -

          This has been running successfully for days-on-end in my clusters. Stack: pl let me know if your testing showed anything amiss. Thanks.

          Show
          dhruba borthakur added a comment - This has been running successfully for days-on-end in my clusters. Stack: pl let me know if your testing showed anything amiss. Thanks.
          Hide
          Lars Hofhansl added a comment -

          Marking this for 0.94

          Show
          Lars Hofhansl added a comment - Marking this for 0.94