HBase
  1. HBase
  2. HBASE-80

[hbase] Add a cache of 'hot' cells

    Details

    • Type: Improvement Improvement
    • Status: Open
    • Priority: Minor Minor
    • Resolution: Unresolved
    • Affects Version/s: None
    • Fix Version/s: None
    • Component/s: regionserver
    • Labels:
      None
    1. WR_PE.java
      8 kB
      Erik Holstad
    2. MemoryTester.java
      6 kB
      Erik Holstad
    3. LruHashMapTester.java
      10 kB
      Erik Holstad
    4. LruHashMap.java
      39 kB
      Erik Holstad
    5. HeapSize.java
      2 kB
      Erik Holstad
    6. hbase-80-v1.patch
      2 kB
      Erik Holstad
    7. Cell.java
      6 kB
      Erik Holstad
    8. cache.patch
      7 kB
      stack

      Issue Links

        Activity

        Hide
        stack added a comment -

        Linking to HADOOP-1398.

        Chatting w/ Tom White, should probably be size-based or made of References.

        Show
        stack added a comment - Linking to HADOOP-1398 . Chatting w/ Tom White, should probably be size-based or made of References.
        Hide
        stack added a comment -

        Very basic LRU cell cache. Caches configurable count of latest version. No accounting of cache size. Not for production use.

        One thing learned is that cell caching is awkward; cells generally don't specify their timestamp yet cell keys, HStoreKey, have an explicit timestamp.

        Show
        stack added a comment - Very basic LRU cell cache. Caches configurable count of latest version. No accounting of cache size. Not for production use. One thing learned is that cell caching is awkward; cells generally don't specify their timestamp yet cell keys, HStoreKey, have an explicit timestamp.
        Hide
        Jonathan Gray added a comment -

        Erik has been working on this for a couple weeks already, patches and numbers coming soon

        Show
        Jonathan Gray added a comment - Erik has been working on this for a couple weeks already, patches and numbers coming soon
        Hide
        Erik Holstad added a comment - - edited

        Sorry for not posting on this issue, even thought I have been assigned and everything
        So the basic idea that I have been working on is to make a key/value cache to speed up random
        reads.

        Test setup:
        Used the same test parameters that are used in the BT paper so it would be easy to compare and
        test have currently only been done on a single machine cluster with one HRegionServer. That setup
        includes 1column/family and every value is 1000B.

        Some numbers for testing this extremely simple cache are:
        Tests done over 10000 reads
        Random reads without cache: 481 r/s
        481 KB/s
        Random reads with cache: 4019 r/s
        4019 KB/s

        Some other test to compare the difference when using multiple columns/family turned out to give the
        following numbers:
        5 columns/family everything else the same as above.
        Random reads without cache: 445 r/s
        2223 KB/s
        Random reads with cache: 3588 r/s
        17940 KB/s

        10 columns/family everything else the same as above.
        Random reads without cache: 24 r/s
        240 KB/s
        Random reads with cache: 25 r/s
        250 KB/s

        For the rest of the test only 100 rows where used to avoid out of memory errors.
        Like first test but fewer rows:
        Random reads without cache: 284 r/s
        284 KB/s
        Random reads with cache: 2083 r/s
        2083 KB/s

        Same as above but with 1000 columns/family
        Random reads without cache: 23 r/s
        23000 KB/s
        Random reads with cache: 76 r/s
        76000 KB/s

        Show
        Erik Holstad added a comment - - edited Sorry for not posting on this issue, even thought I have been assigned and everything So the basic idea that I have been working on is to make a key/value cache to speed up random reads. Test setup: Used the same test parameters that are used in the BT paper so it would be easy to compare and test have currently only been done on a single machine cluster with one HRegionServer. That setup includes 1column/family and every value is 1000B. Some numbers for testing this extremely simple cache are: Tests done over 10000 reads Random reads without cache: 481 r/s 481 KB/s Random reads with cache: 4019 r/s 4019 KB/s Some other test to compare the difference when using multiple columns/family turned out to give the following numbers: 5 columns/family everything else the same as above. Random reads without cache: 445 r/s 2223 KB/s Random reads with cache: 3588 r/s 17940 KB/s 10 columns/family everything else the same as above. Random reads without cache: 24 r/s 240 KB/s Random reads with cache: 25 r/s 250 KB/s For the rest of the test only 100 rows where used to avoid out of memory errors. Like first test but fewer rows: Random reads without cache: 284 r/s 284 KB/s Random reads with cache: 2083 r/s 2083 KB/s Same as above but with 1000 columns/family Random reads without cache: 23 r/s 23000 KB/s Random reads with cache: 76 r/s 76000 KB/s
        Hide
        Erik Holstad added a comment -

        The planned layout for the cache is to keep the cache at the HRegionServer
        level so that you set the amount of memory that you for each HRS.

        Are planning to use a special HashMap that is memory sensitive and keeps track
        of the used memory for the cache at every point and kicks entries whenever needed.

        The harder thing to discuss and talk about is how and what kind of data that would be
        best to store. For example if we want to store a query->result or if a HStoreKey ->
        Cell schema will make better use of the memory.

        All thought and comments are welcome!

        Show
        Erik Holstad added a comment - The planned layout for the cache is to keep the cache at the HRegionServer level so that you set the amount of memory that you for each HRS. Are planning to use a special HashMap that is memory sensitive and keeps track of the used memory for the cache at every point and kicks entries whenever needed. The harder thing to discuss and talk about is how and what kind of data that would be best to store. For example if we want to store a query->result or if a HStoreKey -> Cell schema will make better use of the memory. All thought and comments are welcome!
        Hide
        Jonathan Gray added a comment -

        Very odd result...

        Random writes: Timer for 10000 and cols 7 = 4062 ms
        Random reads: Timer for 100 and size 7 = 105 ms

        Random writes: Timer for 10000 and cols 8 = 3995 ms
        Random reads: Timer for 100 and size 8 = 4024 ms

        Going from 7 columns in the family to 8 columns leads to 40x slowdown. Rebalancing of some sortedmap or hashmap or something on the way out?

        Show
        Jonathan Gray added a comment - Very odd result... Random writes: Timer for 10000 and cols 7 = 4062 ms Random reads: Timer for 100 and size 7 = 105 ms Random writes: Timer for 10000 and cols 8 = 3995 ms Random reads: Timer for 100 and size 8 = 4024 ms Going from 7 columns in the family to 8 columns leads to 40x slowdown. Rebalancing of some sortedmap or hashmap or something on the way out?
        Hide
        Jonathan Gray added a comment -

        Above issue from 7 to 8 cols appears to only happen with caching code in place. Does not seem to occur with 0.19 release. Investigating further... patch will be posted shortly

        Show
        Jonathan Gray added a comment - Above issue from 7 to 8 cols appears to only happen with caching code in place. Does not seem to occur with 0.19 release. Investigating further... patch will be posted shortly
        Hide
        Erik Holstad added a comment - - edited

        The patch and the tester used for the test.
        Test file will be further investigated as Jonathan said.

        Show
        Erik Holstad added a comment - - edited The patch and the tester used for the test. Test file will be further investigated as Jonathan said.
        Hide
        Erik Holstad added a comment -

        So it turns out that the weird issues that we have seen for random reads only applies
        when the client is on the same machine as the master and the HRS. So leaving this
        weird side issue for now and focusing on other issues.

        Show
        Erik Holstad added a comment - So it turns out that the weird issues that we have seen for random reads only applies when the client is on the same machine as the master and the HRS. So leaving this weird side issue for now and focusing on other issues.
        Hide
        Erik Holstad added a comment -

        Ran tests for the current Implementation of the memory sensitive LruHashMap and compared them to HashMap and LinkedHashMap:

        Adding to LruHashMap
        timer 83 for 100000 adds
        Size for map 100000
        Reading from LRU
        timer 133 for 100000 reads
        Deleting from LRU
        timer 112 for 100000 deletes

        Adding to HashMap
        timer 39 for 100000 adds
        Size for map 100000
        Reading from Hash
        timer 110 for 100000 reads
        Deleting from Hash
        timer 79 for 100000 deletes

        Adding to LinkedHashMap
        timer 55 for 100000 adds
        Size for map 100000
        Reading from LinkedHash
        timer 148 for 100000 reads
        Deleting from LinkedHash
        timer 102 for 100000 deletes

        I think that the number for reads and deletes looks ok, but there seems to be something weird going on for puts.
        Will dig deeper, to find out what.

        Show
        Erik Holstad added a comment - Ran tests for the current Implementation of the memory sensitive LruHashMap and compared them to HashMap and LinkedHashMap: Adding to LruHashMap timer 83 for 100000 adds Size for map 100000 Reading from LRU timer 133 for 100000 reads Deleting from LRU timer 112 for 100000 deletes Adding to HashMap timer 39 for 100000 adds Size for map 100000 Reading from Hash timer 110 for 100000 reads Deleting from Hash timer 79 for 100000 deletes Adding to LinkedHashMap timer 55 for 100000 adds Size for map 100000 Reading from LinkedHash timer 148 for 100000 reads Deleting from LinkedHash timer 102 for 100000 deletes I think that the number for reads and deletes looks ok, but there seems to be something weird going on for puts. Will dig deeper, to find out what.
        Hide
        Erik Holstad added a comment -

        Did some small changes to the test program to get a more equal test and the new numbers look like:

        Adding to LRU, timer 79 for 100000 adds
        Reading from LRU, timer 135 for 100000 reads
        Deleting from LRU, timer 123 for 100000 deletes

        Adding to LinkedHash, timer 61 for 100000 adds
        Reading from LinkedHash, timer 119 for 100000 reads
        Deleting from LinkedHash, timer 87 for 100000 deletes

        Adding to Hash, timer 49 for 100000 adds
        Reading from Hash, timer 105 for 100000 reads
        Deleting from Hash, timer 77 for 100000 deletes

        So the big difference between the LHM and the LRU from the previous test are smaller now and looks reasonable.

        Show
        Erik Holstad added a comment - Did some small changes to the test program to get a more equal test and the new numbers look like: Adding to LRU, timer 79 for 100000 adds Reading from LRU, timer 135 for 100000 reads Deleting from LRU, timer 123 for 100000 deletes Adding to LinkedHash, timer 61 for 100000 adds Reading from LinkedHash, timer 119 for 100000 reads Deleting from LinkedHash, timer 87 for 100000 deletes Adding to Hash, timer 49 for 100000 adds Reading from Hash, timer 105 for 100000 reads Deleting from Hash, timer 77 for 100000 deletes So the big difference between the LHM and the LRU from the previous test are smaller now and looks reasonable.
        Hide
        Erik Holstad added a comment -

        The memory sensitive Lru and the tester class, still working on some details about the
        size of the memory used for different classes and will update as soon as I'm done.

        Show
        Erik Holstad added a comment - The memory sensitive Lru and the tester class, still working on some details about the size of the memory used for different classes and will update as soon as I'm done.
        Hide
        Erik Holstad added a comment -

        Updated the HeapSIze interface with some global constants that will be used by classes that
        implement that interface.
        Also updated Cell.java so that it calculates the correct Heapsize value.

        Show
        Erik Holstad added a comment - Updated the HeapSIze interface with some global constants that will be used by classes that implement that interface. Also updated Cell.java so that it calculates the correct Heapsize value.
        Hide
        Erik Holstad added a comment -

        Adding the memory usage tester, the class that is used to find out the sizes
        of other classes.

        Show
        Erik Holstad added a comment - Adding the memory usage tester, the class that is used to find out the sizes of other classes.
        Hide
        Erik Holstad added a comment -

        Was thinking that it is not really clear to me how the new setup where a Class is aware of it size,
        implementing HeapSize, is going to be used for the rest of the project. So I propose that for now
        have 2 different versions of Cell and HbaseMapWritable, one that keeps track of the size and one
        that doesn't, so the overhead of keeping the size doesn't slow down other use cases.
        Another question is also where and when to do the calculation of the size, I have chosen to put
        it in the puts, adds and removes because I think that it is going to be read many scenario for these
        object.

        Show
        Erik Holstad added a comment - Was thinking that it is not really clear to me how the new setup where a Class is aware of it size, implementing HeapSize, is going to be used for the rest of the project. So I propose that for now have 2 different versions of Cell and HbaseMapWritable, one that keeps track of the size and one that doesn't, so the overhead of keeping the size doesn't slow down other use cases. Another question is also where and when to do the calculation of the size, I have chosen to put it in the puts, adds and removes because I think that it is going to be read many scenario for these object.
        Hide
        stack added a comment -

        After chatting w/ Jon and Erik, moving out of 0.20.0. Blockcache will be what we have in 0.20.0.

        Show
        stack added a comment - After chatting w/ Jon and Erik, moving out of 0.20.0. Blockcache will be what we have in 0.20.0.
        Hide
        stack added a comment -

        Moved from 0.21 to 0.22 just after merge of old 0.20 branch into TRUNK.

        Show
        stack added a comment - Moved from 0.21 to 0.22 just after merge of old 0.20 branch into TRUNK.
        Hide
        stack added a comment -

        Moving out of 0.92. Move it back in if you think differently.

        Show
        stack added a comment - Moving out of 0.92. Move it back in if you think differently.
        Hide
        stack added a comment -

        Moving out of 0.92. Move it back in if you think differently.

        Show
        stack added a comment - Moving out of 0.92. Move it back in if you think differently.

          People

          • Assignee:
            Erik Holstad
            Reporter:
            stack
          • Votes:
            1 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

            • Created:
              Updated:

              Development