Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Trivial Trivial
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 0.2.0
    • Component/s: regionserver
    • Labels:
      None

      Description

      Bigtable provides two in-memory caches: one for row/column data and one for disk block caches.

      The size of each cache should be configurable, data should be loaded lazily, and the cache managed by an LRU mechanism.

      One complication of the block cache is that all data is read through a SequenceFile.Reader which ultimately reads data off of disk via a RPC proxy for ClientProtocol. This would imply that the block caching would have to be pushed down to either the DFSClient or SequenceFile.Reader

      1. commons-collections-3.2.jar
        558 kB
        Tom White
      2. hadoop-blockcache.patch
        21 kB
        Tom White
      3. hadoop-blockcache-v2.patch
        22 kB
        Tom White
      4. hadoop-blockcache-v3.patch
        38 kB
        Tom White
      5. hadoop-blockcache-v4.1.patch
        82 kB
        Tom White
      6. hadoop-blockcache-v4.patch
        78 kB
        Tom White
      7. hadoop-blockcache-v5.patch
        82 kB
        Tom White
      8. hadoop-blockcache-v6.patch
        82 kB
        Tom White
      9. hadoop-blockcache-v7.patch
        72 kB
        Tom White

        Issue Links

          Activity

          Hide
          Tom White added a comment -

          Here is an initial implementation - feedback would be much appreciated.

          BlockFSInputStream reads a FSInputStream in a block-oriented manner, and caches blocks. There's also a BlockMapFile.Reader that uses a BlockFSInputStream to read the MapFile data. HStore uses a BlockMapFile.Reader to read the first HStoreFile - at startup and after compaction. New HStoreFiles produced after memcache flushes are read using a regular reader in order to keep memory use fixed. Currently block caching is configured by the hbase properties hbase.hstore.blockCache.maxSize (defaults to 0 - no cache) and hbase.hstore.blockCache.blockSize (defaults to 64k). (It would be desirable to make caches configurable on a per-column family basis - the current way is just a stop gap.)

          I've also had to push details of the block caching implementation up to MapFile.Reader, which is undesirable. The problem is that the streams are opened in the constructor of SequenceFile.Reader, which is called by the constructor of MapFile.Reader, so there is no opportunity to set the final fields blockSize and maxBlockCacheSize on a subclass of MapFile.Reader before the stream is opened. I think the proper solution is to have an explicit open method on SequenceFile.Reader, but I'm not sure about the impact of this since it would be an incompatible change. Perhaps do in conjunction with HADOOP-2604?

          Show
          Tom White added a comment - Here is an initial implementation - feedback would be much appreciated. BlockFSInputStream reads a FSInputStream in a block-oriented manner, and caches blocks. There's also a BlockMapFile.Reader that uses a BlockFSInputStream to read the MapFile data. HStore uses a BlockMapFile.Reader to read the first HStoreFile - at startup and after compaction. New HStoreFiles produced after memcache flushes are read using a regular reader in order to keep memory use fixed. Currently block caching is configured by the hbase properties hbase.hstore.blockCache.maxSize (defaults to 0 - no cache) and hbase.hstore.blockCache.blockSize (defaults to 64k). (It would be desirable to make caches configurable on a per-column family basis - the current way is just a stop gap.) I've also had to push details of the block caching implementation up to MapFile.Reader, which is undesirable. The problem is that the streams are opened in the constructor of SequenceFile.Reader, which is called by the constructor of MapFile.Reader, so there is no opportunity to set the final fields blockSize and maxBlockCacheSize on a subclass of MapFile.Reader before the stream is opened. I think the proper solution is to have an explicit open method on SequenceFile.Reader, but I'm not sure about the impact of this since it would be an incompatible change. Perhaps do in conjunction with HADOOP-2604 ?
          Hide
          stack added a comment -

          Patch looks great Tom.

          You pass 'length' in the below but its not used:

          +    protected FSDataInputStream openFile(FileSystem fs, Path file,
          +        int bufferSize, long length) throws IOException {
          +      return fs.open(file, bufferSize);
          

          I presume you have plans for it later?

          You have confidence in the LruMap class? You don't have unit tests (though these things are hard to test). I ask because though small, sometimes these kinds of classes can prove a little tricky....

          Do you have any numbers for how it improves throughput when cached blocks are 'hot'? And you talked of a slight 'cost'. Do you have rough numbers for that too? (Playing on datanode adjusting the size of the CRC blocks, a similar type of blocking to what you have here, there was no discernable difference adjusting sizes).

          What do we need to add to make it so its easy to enable/disable this feature on a per-column basis? Currently edits to column config. requires taking column offline. Changing this configuration looks safe-to-do while the column stays on line. Would you agree?

          Show
          stack added a comment - Patch looks great Tom. You pass 'length' in the below but its not used: + protected FSDataInputStream openFile(FileSystem fs, Path file, + int bufferSize, long length) throws IOException { + return fs.open(file, bufferSize); I presume you have plans for it later? You have confidence in the LruMap class? You don't have unit tests (though these things are hard to test). I ask because though small, sometimes these kinds of classes can prove a little tricky.... Do you have any numbers for how it improves throughput when cached blocks are 'hot'? And you talked of a slight 'cost'. Do you have rough numbers for that too? (Playing on datanode adjusting the size of the CRC blocks, a similar type of blocking to what you have here, there was no discernable difference adjusting sizes). What do we need to add to make it so its easy to enable/disable this feature on a per-column basis? Currently edits to column config. requires taking column offline. Changing this configuration looks safe-to-do while the column stays on line. Would you agree?
          Hide
          stack added a comment -

          Tom: Ignore comment above on LruMap. I just reread it.

          Show
          stack added a comment - Tom: Ignore comment above on LruMap. I just reread it.
          Hide
          Tom White added a comment -

          You pass 'length' in the below but its not used:

          It is used in the subclass of SequenceFile.Reader by BlockFSInputStream.

          Do you have any numbers for how it improves throughput when cached blocks are 'hot'?

          I haven't got any numbers yet (working on them), but random reads will suffer in general since a whole 64KB block is retrieved to just read a single key/value. The Bigtable paper talks about reducing the block size to 8KB (see section 7).

          What do we need to add to make it so its easy to enable/disable this feature on a per-column basis? Currently edits to column config. requires taking column offline. Changing this configuration looks safe-to-do while the column stays on line. Would you agree?

          Agreed. I think that dynamically editing a column descriptor should go in a separate jira issue. For now, I was planning on just adding the new parameters to HColumnDescriptor. Does the version number need bumping in this case?

          Show
          Tom White added a comment - You pass 'length' in the below but its not used: It is used in the subclass of SequenceFile.Reader by BlockFSInputStream. Do you have any numbers for how it improves throughput when cached blocks are 'hot'? I haven't got any numbers yet (working on them), but random reads will suffer in general since a whole 64KB block is retrieved to just read a single key/value. The Bigtable paper talks about reducing the block size to 8KB (see section 7). What do we need to add to make it so its easy to enable/disable this feature on a per-column basis? Currently edits to column config. requires taking column offline. Changing this configuration looks safe-to-do while the column stays on line. Would you agree? Agreed. I think that dynamically editing a column descriptor should go in a separate jira issue. For now, I was planning on just adding the new parameters to HColumnDescriptor. Does the version number need bumping in this case?
          Hide
          Tom White added a comment -

          A second patch with minimal changes to MapFile.Reader - there is a now a protected open() method for subclasses that wish to defer opening the streams until further initialization has been carried out.

          Show
          Tom White added a comment - A second patch with minimal changes to MapFile.Reader - there is a now a protected open() method for subclasses that wish to defer opening the streams until further initialization has been carried out.
          Hide
          Tom White added a comment -

          I'm trying to add a new parameter to HColumnDescriptor and would appreciate a little guidance. Do I need to worry about the version number? Is the order of the serialized fields important? It would be nice to group together the caching related ones if possible, so the block cache parameter would naturally sit next to the inMemory one. Ditto for the Thrift representation - how does it handle versioning? Thanks.

          Show
          Tom White added a comment - I'm trying to add a new parameter to HColumnDescriptor and would appreciate a little guidance. Do I need to worry about the version number? Is the order of the serialized fields important? It would be nice to group together the caching related ones if possible, so the block cache parameter would naturally sit next to the inMemory one. Ditto for the Thrift representation - how does it handle versioning? Thanks.
          Hide
          Jim Kellerman added a comment -

          Tom,

          Yes, we need to start versioning everything that goes out to disk. And if we make an incompatible change, we either need to correct for it on the fly or augment the migration tool (hbase.util.Migrate.java)

          Show
          Jim Kellerman added a comment - Tom, Yes, we need to start versioning everything that goes out to disk. And if we make an incompatible change, we either need to correct for it on the fly or augment the migration tool (hbase.util.Migrate.java)
          Hide
          Tom White added a comment -

          This version (v3) changes the cache to a memory sensitive cache, implemented using SoftReferences (http://commons.apache.org/collections/api-release/org/apache/commons/collections/map/ReferenceMap.html). See HADOOP-2624 for background.

          Also, block caching can be enabled on a column-family basis. The size of the block is a system wide setting - this could be adjustable on a per-column basis in the future, if it were deemed necessary.

          I'm still looking at a performance comparison.

          Show
          Tom White added a comment - This version (v3) changes the cache to a memory sensitive cache, implemented using SoftReferences ( http://commons.apache.org/collections/api-release/org/apache/commons/collections/map/ReferenceMap.html ). See HADOOP-2624 for background. Also, block caching can be enabled on a column-family basis. The size of the block is a system wide setting - this could be adjustable on a per-column basis in the future, if it were deemed necessary. I'm still looking at a performance comparison.
          Hide
          Tom White added a comment -

          New dependency to go in src/contrib/hbase/lib/.

          Show
          Tom White added a comment - New dependency to go in src/contrib/hbase/lib/.
          Hide
          stack added a comment -

          (... continuing IRC discussion).

          I didn't realize HColumnDescriptor was versioned. It doesn't seem to have been added by either Jim or I. Someone smarter no doubt. So, my comment that this change is incompatible doesn't hold since I see you have code to make it so HCD migrates itself. Nice.

          In the below from HStoreFile, blockCacheEnabled method argument is not being passed to the MapFile constructors.

          +  public synchronized MapFile.Reader getReader(final FileSystem fs,
          +      final Filter bloomFilter, final boolean blockCacheEnabled)
          +  throws IOException {
          +    
          +    if (isReference()) {
          +      return new HStoreFile.HalfMapFileReader(fs,
          +          getMapFilePath(reference).toString(), conf, 
          +          reference.getFileRegion(), reference.getMidkey(), bloomFilter);
          +    }
          +    return new BloomFilterMapFile.Reader(fs, getMapFilePath().toString(),
          +        conf, bloomFilter);
          +  }
          

          Out of interest, did you regenerate the thrift or hand-edit it? Changes look right – just wondering.

          Default ReferenceMap constructor makes for hard keys and soft values. If value has been let go by the GC, does the corresponding key just stay in the Map?

          Otherwise, patch looks great Tom.

          Show
          stack added a comment - (... continuing IRC discussion). I didn't realize HColumnDescriptor was versioned. It doesn't seem to have been added by either Jim or I. Someone smarter no doubt. So, my comment that this change is incompatible doesn't hold since I see you have code to make it so HCD migrates itself. Nice. In the below from HStoreFile, blockCacheEnabled method argument is not being passed to the MapFile constructors. + public synchronized MapFile.Reader getReader( final FileSystem fs, + final Filter bloomFilter, final boolean blockCacheEnabled) + throws IOException { + + if (isReference()) { + return new HStoreFile.HalfMapFileReader(fs, + getMapFilePath(reference).toString(), conf, + reference.getFileRegion(), reference.getMidkey(), bloomFilter); + } + return new BloomFilterMapFile.Reader(fs, getMapFilePath().toString(), + conf, bloomFilter); + } Out of interest, did you regenerate the thrift or hand-edit it? Changes look right – just wondering. Default ReferenceMap constructor makes for hard keys and soft values. If value has been let go by the GC, does the corresponding key just stay in the Map? Otherwise, patch looks great Tom.
          Hide
          Tom White added a comment - - edited

          In the below from HStoreFile, blockCacheEnabled method argument is not being passed to the MapFile constructors.

          Thanks - this had the effect of never enabling the cache! I've fixed this.

          Out of interest, did you regenerate the thrift or hand-edit it? Changes look right - just wondering.

          I regenerated using the latest thrift trunk.

          Default ReferenceMap constructor makes for hard keys and soft values. If value has been let go by the GC, does the corresponding key just stay in the Map?

          No, both the key and the value are removed from the map - I checked the source.

          This patch also includes changes to HBase Shell so you can alter a table to enable block caching.

          Show
          Tom White added a comment - - edited In the below from HStoreFile, blockCacheEnabled method argument is not being passed to the MapFile constructors. Thanks - this had the effect of never enabling the cache! I've fixed this. Out of interest, did you regenerate the thrift or hand-edit it? Changes look right - just wondering. I regenerated using the latest thrift trunk. Default ReferenceMap constructor makes for hard keys and soft values. If value has been let go by the GC, does the corresponding key just stay in the Map? No, both the key and the value are removed from the map - I checked the source. This patch also includes changes to HBase Shell so you can alter a table to enable block caching.
          Hide
          Tom White added a comment -

          Fixing the v4 patch which was corrupt.

          Show
          Tom White added a comment - Fixing the v4 patch which was corrupt.
          Hide
          stack added a comment -

          Patch looks good Tom. I changed my mind since IRC this morning. Now I think hbase should align with the parent and not add new features since feature freeze untill after we make the 0.16 branch (Kick me on IRC if you think different).

          Show
          stack added a comment - Patch looks good Tom. I changed my mind since IRC this morning. Now I think hbase should align with the parent and not add new features since feature freeze untill after we make the 0.16 branch (Kick me on IRC if you think different).
          Hide
          Tom White added a comment -

          I ran some benchmarks of PerformanceEvaluation with and without block caching enabled. The setup was similar to that described in http://wiki.apache.org/hadoop/Hbase/PerformanceEvaluation, with three machines on EC2: one running the namenode and HBase master, one running a datanode and a region server, and one running a datanode and the PerformanceEvaluation program.

          Number of operations per second:

          Experiment Block cache disabled Block cache enabled
          sequential reads 119 182
          random reads 110 123

          I've seen quite a lot of variation in the results of PerformanceEvaluation, so I'm reluctant to read too much into these figures. But I think we can say that the block cache doesn't seem to slow down the system.

          Show
          Tom White added a comment - I ran some benchmarks of PerformanceEvaluation with and without block caching enabled. The setup was similar to that described in http://wiki.apache.org/hadoop/Hbase/PerformanceEvaluation , with three machines on EC2: one running the namenode and HBase master, one running a datanode and a region server, and one running a datanode and the PerformanceEvaluation program. Number of operations per second: Experiment Block cache disabled Block cache enabled sequential reads 119 182 random reads 110 123 I've seen quite a lot of variation in the results of PerformanceEvaluation, so I'm reluctant to read too much into these figures. But I think we can say that the block cache doesn't seem to slow down the system.
          Hide
          Tom White added a comment -

          Regenerated patch that applies cleanly.

          Show
          Tom White added a comment - Regenerated patch that applies cleanly.
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12374558/hadoop-blockcache-v5.patch
          against trunk revision 616796.

          @author +1. The patch does not contain any @author tags.

          javadoc +1. The javadoc tool did not generate any warning messages.

          javac +1. The applied patch does not generate any new javac compiler warnings.

          release audit +1. The applied patch does not generate any new release audit warnings.

          findbugs -1. The patch appears to introduce 4 new Findbugs warnings.

          core tests +1. The patch passed core unit tests.

          contrib tests +1. The patch passed contrib unit tests.

          Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/1721/testReport/
          Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/1721/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
          Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/1721/artifact/trunk/build/test/checkstyle-errors.html
          Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/1721/console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12374558/hadoop-blockcache-v5.patch against trunk revision 616796. @author +1. The patch does not contain any @author tags. javadoc +1. The javadoc tool did not generate any warning messages. javac +1. The applied patch does not generate any new javac compiler warnings. release audit +1. The applied patch does not generate any new release audit warnings. findbugs -1. The patch appears to introduce 4 new Findbugs warnings. core tests +1. The patch passed core unit tests. contrib tests +1. The patch passed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/1721/testReport/ Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/1721/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/1721/artifact/trunk/build/test/checkstyle-errors.html Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/1721/console This message is automatically generated.
          Hide
          Hudson added a comment -
          Show
          Hudson added a comment - Integrated in Hadoop-trunk #387 (See http://hudson.zones.apache.org/hudson/job/Hadoop-trunk/387/ )
          Hide
          Tom White added a comment -

          Fix for findbugs warnings. (Note that this has not been integrated yet - the previous comment by Hudson was generated since the collections jar has been checked in.)

          Show
          Tom White added a comment - Fix for findbugs warnings. (Note that this has not been integrated yet - the previous comment by Hudson was generated since the collections jar has been checked in.)
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12374684/hadoop-blockcache-v6.patch
          against trunk revision 616796.

          @author +1. The patch does not contain any @author tags.

          javadoc +1. The javadoc tool did not generate any warning messages.

          javac +1. The applied patch does not generate any new javac compiler warnings.

          release audit +1. The applied patch does not generate any new release audit warnings.

          findbugs +1. The patch does not introduce any new Findbugs warnings.

          core tests +1. The patch passed core unit tests.

          contrib tests -1. The patch failed contrib unit tests.

          Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/1736/testReport/
          Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/1736/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
          Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/1736/artifact/trunk/build/test/checkstyle-errors.html
          Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/1736/console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12374684/hadoop-blockcache-v6.patch against trunk revision 616796. @author +1. The patch does not contain any @author tags. javadoc +1. The javadoc tool did not generate any warning messages. javac +1. The applied patch does not generate any new javac compiler warnings. release audit +1. The applied patch does not generate any new release audit warnings. findbugs +1. The patch does not introduce any new Findbugs warnings. core tests +1. The patch passed core unit tests. contrib tests -1. The patch failed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/1736/testReport/ Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/1736/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/1736/artifact/trunk/build/test/checkstyle-errors.html Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/1736/console This message is automatically generated.
          Hide
          stack added a comment -

          Test passes on my machine. +1 on applying the patch after quick review of v6.

          Show
          stack added a comment - Test passes on my machine. +1 on applying the patch after quick review of v6.
          Hide
          Tom White added a comment -

          I've just committed this.

          Show
          Tom White added a comment - I've just committed this.
          Hide
          stack added a comment -

          Tom, I backed out the hbase component of this patch temporarily. Notion is that we get hbase fixed up over in its new svn home, then we branch. We want the branch to go against hadoop-0.16.0. Once branch is done, we'll put this patch back into hbase TRUNK (With this patch in place, hbase requires post 0.16.0 hadoop).

          Show
          stack added a comment - Tom, I backed out the hbase component of this patch temporarily. Notion is that we get hbase fixed up over in its new svn home, then we branch. We want the branch to go against hadoop-0.16.0. Once branch is done, we'll put this patch back into hbase TRUNK (With this patch in place, hbase requires post 0.16.0 hadoop).
          Hide
          Jim Kellerman added a comment -

          Reopening issue. Patch was not fully backed out of HBase.

          Show
          Jim Kellerman added a comment - Reopening issue. Patch was not fully backed out of HBase.
          Hide
          Tom White added a comment -

          A new patch (v7) with just the HBase parts in it. I successfully ran the HBase unit tests with this patch by using a Hadoop Core 0.16 jar that had been patched with the MapFile and SequenceFile changes in core trunk.

          This can be applied to trunk after the branch is created.

          Jim/Stack/Bryan: Sorry about the extra work I caused you by committing too early!

          Show
          Tom White added a comment - A new patch (v7) with just the HBase parts in it. I successfully ran the HBase unit tests with this patch by using a Hadoop Core 0.16 jar that had been patched with the MapFile and SequenceFile changes in core trunk. This can be applied to trunk after the branch is created. Jim/Stack/Bryan: Sorry about the extra work I caused you by committing too early!
          Hide
          stack added a comment -

          Committed. Thanks for the patch Tom.

          Show
          stack added a comment - Committed. Thanks for the patch Tom.
          Hide
          Joonbok Lee added a comment -

          I'm trying to apply this patch to Hbase 1.3. v7 patch has only the HBase parts and seems not to be works with previous patch's hdfs part (sequenceFile and mapFile part). v7 patch use not MapFile.Reader( ..., blocksize, maxBlockCacheSize) but MapFile.Reader( ..., blockCacheEnabled) . Could you upload v7 patch file including hdfs parts?

          Show
          Joonbok Lee added a comment - I'm trying to apply this patch to Hbase 1.3. v7 patch has only the HBase parts and seems not to be works with previous patch's hdfs part (sequenceFile and mapFile part). v7 patch use not MapFile.Reader( ..., blocksize, maxBlockCacheSize) but MapFile.Reader( ..., blockCacheEnabled) . Could you upload v7 patch file including hdfs parts?
          Hide
          stack added a comment -

          Joonbok: I thought this patch had already been applied (to hbase since 0.1 and the hadoop portion to hadoop since 0.16.x?) Is that why its not applying (Could be hard to figure it because a lot has changed in hbase since 02/07/08).

          Show
          stack added a comment - Joonbok: I thought this patch had already been applied (to hbase since 0.1 and the hadoop portion to hadoop since 0.16.x?) Is that why its not applying (Could be hard to figure it because a lot has changed in hbase since 02/07/08).

            People

            • Assignee:
              Tom White
              Reporter:
              Jim Kellerman
            • Votes:
              1 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development