Details

    • Type: Improvement Improvement
    • Status: Open
    • Priority: Minor Minor
    • Resolution: Unresolved
    • Affects Version/s: 3.6, 4.0-ALPHA
    • Fix Version/s: 4.9, 5.0
    • Component/s: search
    • Labels:
      None

      Description

      SOLR-2906 gave us an inefficient LFU cache modeled on FastLRUCache/ConcurrentLRUCache. It could use some serious improvement. The following project includes an Apache 2.0 licensed O(1) implementation. The second link is the paper (PDF warning) it was based on:

      https://github.com/chirino/hawtdb
      http://dhruvbird.com/lfu.pdf

      Using this project and paper, I will attempt to make a new O(1) cache called FastLFUCache that is modeled on LRUCache.java. This will (for now) leave the existing LFUCache/ConcurrentLFUCache implementation in place.

      1. SOLR-3393-trunk-withdecay.patch
        96 kB
        Shawn Heisey
      2. SOLR-3393-4x-withdecay.patch
        96 kB
        Shawn Heisey
      3. SOLR-3393.patch
        60 kB
        Shawn Heisey
      4. SOLR-3393.patch
        59 kB
        Shawn Heisey
      5. SOLR-3393.patch
        64 kB
        Shawn Heisey
      6. SOLR-3393.patch
        68 kB
        Shawn Heisey
      7. SOLR-3393.patch
        67 kB
        Hoss Man
      8. SOLR-3393.patch
        77 kB
        Adrien Grand
      9. SOLR-3393.patch
        93 kB
        Adrien Grand

        Issue Links

          Activity

          Shawn Heisey created issue -
          Hide
          Shawn Heisey added a comment -

          After all the bugs are worked out of the implementation covered by this issue, I can see a future scenario where the existing (slow) LFUCache is scrapped, FastLFUCache is renamed LFUCache, and a new FastLFUCache implementation that uses a Concurrent class is created. If the committers would prefer that I go ahead and scrap LFUCache now and just use that name, let me know.

          I have no plans to work on SOLR-2889, so it may be best to just close that issue. I don't know if SOLR-2906 should be changed from a subtask to a full task.

          Show
          Shawn Heisey added a comment - After all the bugs are worked out of the implementation covered by this issue, I can see a future scenario where the existing (slow) LFUCache is scrapped, FastLFUCache is renamed LFUCache, and a new FastLFUCache implementation that uses a Concurrent class is created. If the committers would prefer that I go ahead and scrap LFUCache now and just use that name, let me know. I have no plans to work on SOLR-2889 , so it may be best to just close that issue. I don't know if SOLR-2906 should be changed from a subtask to a full task.
          Shawn Heisey made changes -
          Field Original Value New Value
          Description SOLR-2906 gave us an inefficient LFU cache modeled on FastLRUCache/ConcurrentLRUCache. It could use some serious improvement. The following project includes an Apache 2.0 licensed O(1) implementation. The second link is the paper (PDF warning) it was based on:

          https://github.com/chirino/hawtdb
          http://dhruvbird.com/lfu.pdf

          Using this project and paper, I will attempt to make a new O(1) cache called FastLFUCache that is modeled on LRUCache.java. This will (for now) leave the existing FastLFUCache/ConcurrentLFUCache implementation in place.
          SOLR-2906 gave us an inefficient LFU cache modeled on FastLRUCache/ConcurrentLRUCache. It could use some serious improvement. The following project includes an Apache 2.0 licensed O(1) implementation. The second link is the paper (PDF warning) it was based on:

          https://github.com/chirino/hawtdb
          http://dhruvbird.com/lfu.pdf

          Using this project and paper, I will attempt to make a new O(1) cache called FastLFUCache that is modeled on LRUCache.java. This will (for now) leave the existing LFUCache/ConcurrentLFUCache implementation in place.
          Hide
          Shawn Heisey added a comment -

          I've been working on this. I've come to realize that I don't completely understand how CacheRegenerator works. I suspect that it is geared around LRU caches and that the new cache won't have any of the frequency information from the old one, it will just put the entries into the cache as if they were new. Can anyone confirm this? If I am right, I think my SOLR-2906 implementation is incorrect at warm time.

          After the new cache is regenerated, should I go through the new cache, grab the frequency information from the old cache with each key, and fix the new cache up?

          Yonik, you were the one that came up with the timeDecay option for SOLR-2906. It was added to the markAndSweep routine (which evicts old entries). Should it also be in the warm routine, or possibly only exist in the warm routine?

          Show
          Shawn Heisey added a comment - I've been working on this. I've come to realize that I don't completely understand how CacheRegenerator works. I suspect that it is geared around LRU caches and that the new cache won't have any of the frequency information from the old one, it will just put the entries into the cache as if they were new. Can anyone confirm this? If I am right, I think my SOLR-2906 implementation is incorrect at warm time. After the new cache is regenerated, should I go through the new cache, grab the frequency information from the old cache with each key, and fix the new cache up? Yonik, you were the one that came up with the timeDecay option for SOLR-2906 . It was added to the markAndSweep routine (which evicts old entries). Should it also be in the warm routine, or possibly only exist in the warm routine?
          Hide
          Shawn Heisey added a comment -

          Patch for new LFUCache implementation. It should be pretty close to O(1). ConcurrentLFUCache is removed. TestLFUCache and SolrInfoMBeanTest have been updated as well. All tests pass.

          Show
          Shawn Heisey added a comment - Patch for new LFUCache implementation. It should be pretty close to O(1). ConcurrentLFUCache is removed. TestLFUCache and SolrInfoMBeanTest have been updated as well. All tests pass.
          Shawn Heisey made changes -
          Attachment SOLR-3393.patch [ 12524015 ]
          Hide
          Shawn Heisey added a comment - - edited

          New patch that does not touch SolrInfoMBeanTest. I figured out what FastLRUCache does to avoid failing this test and did the same. The getStatistics method now immediately returns an empty NamedList if the cache has not been initialized. This could be done for LRUCache as well.

          Show
          Shawn Heisey added a comment - - edited New patch that does not touch SolrInfoMBeanTest. I figured out what FastLRUCache does to avoid failing this test and did the same. The getStatistics method now immediately returns an empty NamedList if the cache has not been initialized. This could be done for LRUCache as well.
          Shawn Heisey made changes -
          Attachment SOLR-3393.patch [ 12524028 ]
          Shawn Heisey made changes -
          Link This issue supercedes SOLR-2906 [ SOLR-2906 ]
          Hide
          Shawn Heisey added a comment -

          Further work, but no patch yet because it requires a bunch of test tweaks: I have split timeDecay into evictDecay and warmDecay. The timeDecay option still works, and sets both to true. In the absence of the timeDecay option, I've made evictDecay default to false and warmDecay default to true.

          In order to cause elements to decay over time, every entry in the cache must be removed from one LinkedHashSet and added to another. This is probably pretty fast for this implementation, but it does still require time. My use case scenario will have commits relatively often, up to once a minute. For me, doing the decay at warm time is enough, and results in fewer items to touch. To have it happen at eviction time is overhead I don't need, but it would be correct for some use cases. What are some other people's opinions about the default settings?

          Show
          Shawn Heisey added a comment - Further work, but no patch yet because it requires a bunch of test tweaks: I have split timeDecay into evictDecay and warmDecay. The timeDecay option still works, and sets both to true. In the absence of the timeDecay option, I've made evictDecay default to false and warmDecay default to true. In order to cause elements to decay over time, every entry in the cache must be removed from one LinkedHashSet and added to another. This is probably pretty fast for this implementation, but it does still require time. My use case scenario will have commits relatively often, up to once a minute. For me, doing the decay at warm time is enough, and results in fewer items to touch. To have it happen at eviction time is overhead I don't need, but it would be correct for some use cases. What are some other people's opinions about the default settings?
          Hide
          Hoss Man added a comment -

          I will attempt to make a new O(1) cache called FastLFUCache

          #OhDearGodPleaseNotAnotherClassWithFastInTheName

          Please, please, please lets end the madness of subjective adjectives in class names ... if it's an LFU cache wrapped around a "hawtdb" why don't we just call it "HawtDbLFUCache" ?

          I've been working on this. I've come to realize that I don't completely understand how CacheRegenerator works. I suspect that it is geared around LRU caches and that the new cache won't have any of the frequency information from the old one, it will just put the entries into the cache as if they were new. Can anyone confirm this?

          The idea behind the CacheRegenerator API is to be as simple as possible and agnostic to:

          • the Cache Impl (ie: LRUCache vs LFUCache vs HawtDbLFUCache)
          • the cache usage (ie: Query->DocSets vs Query->DocList vs String->MyCustomClass)
          • the means of generating values from keys (ie: how do you know which MyCustomClass should be cached for which String)

          ... so you can have a custom (named) cache instance declared in your solrconfig.xml with your own MySpecialCacheRegenerator that knows about your usecase and might do something special with the keys/values (like: short-circut part of the generation if it can see the data hasn't changed, or read from authoritative data files outside of solr, etc...) and then use any Cache impl class that you're heart desires, and things will still work right.

          After the new cache is regenerated, should I go through the new cache, grab the frequency information from the old cache with each key, and fix the new cache up?

          you certainly could – when (new HawtDbLFUCache(...)).warm(...) is called, it needs to delegate to the regenerator for pulling values from the "old" cache, but that doesn't mean it can't also directly ask the "old" cache instance for stats about each of the old keys as it loops over them – remember: the "new" cache is the one inspecting the "old" cache and deciding what things to ask the regenerator to generate.

          But i question whether you really want any sort of stats from the "old" cache copied over to the "new" cache. it is, after all, a completely new cache – with new usage. should the stats really be preserved forever? regardless of how popular an object was in the "old" cache instance, should we automatically assume it's equally popular in the "new" cache instance?

          Show
          Hoss Man added a comment - I will attempt to make a new O(1) cache called FastLFUCache #OhDearGodPleaseNotAnotherClassWithFastInTheName Please, please, please lets end the madness of subjective adjectives in class names ... if it's an LFU cache wrapped around a "hawtdb" why don't we just call it "HawtDbLFUCache" ? I've been working on this. I've come to realize that I don't completely understand how CacheRegenerator works. I suspect that it is geared around LRU caches and that the new cache won't have any of the frequency information from the old one, it will just put the entries into the cache as if they were new. Can anyone confirm this? The idea behind the CacheRegenerator API is to be as simple as possible and agnostic to: the Cache Impl (ie: LRUCache vs LFUCache vs HawtDbLFUCache) the cache usage (ie: Query->DocSets vs Query->DocList vs String->MyCustomClass) the means of generating values from keys (ie: how do you know which MyCustomClass should be cached for which String) ... so you can have a custom (named) cache instance declared in your solrconfig.xml with your own MySpecialCacheRegenerator that knows about your usecase and might do something special with the keys/values (like: short-circut part of the generation if it can see the data hasn't changed, or read from authoritative data files outside of solr, etc...) and then use any Cache impl class that you're heart desires, and things will still work right. After the new cache is regenerated, should I go through the new cache, grab the frequency information from the old cache with each key, and fix the new cache up? you certainly could – when (new HawtDbLFUCache(...)).warm(...) is called, it needs to delegate to the regenerator for pulling values from the "old" cache, but that doesn't mean it can't also directly ask the "old" cache instance for stats about each of the old keys as it loops over them – remember: the "new" cache is the one inspecting the "old" cache and deciding what things to ask the regenerator to generate. But i question whether you really want any sort of stats from the "old" cache copied over to the "new" cache. it is, after all, a completely new cache – with new usage. should the stats really be preserved forever? regardless of how popular an object was in the "old" cache instance, should we automatically assume it's equally popular in the "new" cache instance?
          Hide
          Shawn Heisey added a comment -

          Hoss, thanks for your comments.

          Since I was the one who wrote the previous version of this, I just opted to completely replace LFUCache rather than go with a "Fast" appendage. I hadn't considered the naming problem in quite the same light as "yet another fast*" name, but it did seem like a bad idea.

          Yonik had the same concern about stats being preserved forever on SOLR-2906, and he helped with a decay option to deal with that. I think the decay is a good idea. There was only one kind of decay before, applied to all elements anytime there were evictions, defaulted to on.

          In a new version of the patch for this issue (which I have not yet uploaded) I have now included two kinds of decay. There is the kind applied at eviction, now defaulting to off, and one applied at warming, defaulting to on. I will expand the documentation on the Wiki, making it clear that turning off the decay option will probably lead to an undesirable cache state. Currently the decay is implemented with a bit shift (>>> 1), I may make another option available that just subtracts one, and we can bikeshed about which option should be default.

          Show
          Shawn Heisey added a comment - Hoss, thanks for your comments. Since I was the one who wrote the previous version of this, I just opted to completely replace LFUCache rather than go with a "Fast" appendage. I hadn't considered the naming problem in quite the same light as "yet another fast*" name, but it did seem like a bad idea. Yonik had the same concern about stats being preserved forever on SOLR-2906 , and he helped with a decay option to deal with that. I think the decay is a good idea. There was only one kind of decay before, applied to all elements anytime there were evictions, defaulted to on. In a new version of the patch for this issue (which I have not yet uploaded) I have now included two kinds of decay. There is the kind applied at eviction, now defaulting to off, and one applied at warming, defaulting to on. I will expand the documentation on the Wiki, making it clear that turning off the decay option will probably lead to an undesirable cache state. Currently the decay is implemented with a bit shift (>>> 1), I may make another option available that just subtracts one, and we can bikeshed about which option should be default.
          Hide
          Shawn Heisey added a comment -

          Patch with separate options for evictDecay and warmDecay. The timeDecay option sets the other two options to true, for compatibility with 3.6.0. The tests aren't yet updated, so one of them fails. I need to change the tests so they verify operation of both the new options.

          Show
          Shawn Heisey added a comment - Patch with separate options for evictDecay and warmDecay. The timeDecay option sets the other two options to true, for compatibility with 3.6.0. The tests aren't yet updated, so one of them fails. I need to change the tests so they verify operation of both the new options.
          Shawn Heisey made changes -
          Attachment SOLR-3393.patch [ 12524276 ]
          Hide
          Shawn Heisey added a comment -

          Updated patch. Tests pass and should cover all current functionality. It defaults to a keepPercentage of 80. I'm not sure I like 'keepPercentage' as an option name, feel free to bikeshed over that and all the defaults I have chosen.

          It might be ready to commit now, if nobody finds fault with it.

          Show
          Shawn Heisey added a comment - Updated patch. Tests pass and should cover all current functionality. It defaults to a keepPercentage of 80. I'm not sure I like 'keepPercentage' as an option name, feel free to bikeshed over that and all the defaults I have chosen. It might be ready to commit now, if nobody finds fault with it.
          Shawn Heisey made changes -
          Attachment SOLR-3393.patch [ 12524822 ]
          Hide
          Shawn Heisey added a comment - - edited

          Is this too big a change to be included in 3.6.1? I will likely be trying this out with 3.6, once I clear my plate a little bit. Possibly futile justification: It's a bug fix! The bug it fixes is "poor performance."

          Show
          Shawn Heisey added a comment - - edited Is this too big a change to be included in 3.6.1? I will likely be trying this out with 3.6, once I clear my plate a little bit. Possibly futile justification: It's a bug fix! The bug it fixes is "poor performance."
          Hide
          Shawn Heisey added a comment -

          The commit for SOLR-1893 means that my patch needs more work. I have a business trip next week that takes priority, though.

          Show
          Shawn Heisey added a comment - The commit for SOLR-1893 means that my patch needs more work. I have a business trip next week that takes priority, though.
          Hide
          Hoss Man added a comment -

          bulk fixing the version info for 4.0-ALPHA and 4.0 all affected issues have "hoss20120711-bulk-40-change" in comment

          Show
          Hoss Man added a comment - bulk fixing the version info for 4.0-ALPHA and 4.0 all affected issues have "hoss20120711-bulk-40-change" in comment
          Hoss Man made changes -
          Fix Version/s 4.0 [ 12322455 ]
          Fix Version/s 4.0-ALPHA [ 12314992 ]
          Hide
          Robert Muir added a comment -

          rmuir20120906-bulk-40-change

          Show
          Robert Muir added a comment - rmuir20120906-bulk-40-change
          Robert Muir made changes -
          Fix Version/s 4.0 [ 12322551 ]
          Fix Version/s 4.0-BETA [ 12322455 ]
          Hide
          Hoss Man added a comment -

          i manually wrangled Shawn's latest patch so that it works on trunk - tests pass, but i don't know if i'm comfortable committing for 4.0 w/o...

          1) a second pair of eyeballs sanity checking that my manual beating on the patch resulted in the correct final code.

          2) some timing data indicating that this change really is faster – particularly since it replaces the internals of LFUCache, so it will affect upgrading users even if they don't change their configs.

          Show
          Hoss Man added a comment - i manually wrangled Shawn's latest patch so that it works on trunk - tests pass, but i don't know if i'm comfortable committing for 4.0 w/o... 1) a second pair of eyeballs sanity checking that my manual beating on the patch resulted in the correct final code. 2) some timing data indicating that this change really is faster – particularly since it replaces the internals of LFUCache, so it will affect upgrading users even if they don't change their configs.
          Hoss Man made changes -
          Attachment SOLR-3393.patch [ 12543797 ]
          Hide
          Adrien Grand added a comment -

          Hi hoss, thanks for bringing this up!

          • put should return the previous value instead of returning the value that has been put into the cache.
          • I don't like the evictDecay option, I assume it is here to prevent the most frequently used entry from being evicted in case all entries have a frequency >= maxFreq but on the other hand it makes every put operation run in O( n ) so maybe we should just remove it and add a message in the class javadocs to warn users that entries that have a frequency >= maxFreq are evicted according to a LRU policy.
          • Maybe we should remove warmDecay as well, I understand it is here to try to prevent cache pollution but it makes the cache behave differently in case there are commits: if an entry is retrieved 5 times before a commit and 5 times after, it will be considered less frequently used than an entry that has been retrieved 8 times after the commit, this is counterintuitive.
          • I think Entry.value and Entry.frequency don't need to be volatile?
          • maxCacheSize - 1 is probably a too high default value for maxFreq. It can make doEviction (and consequently put) run in O( n ). Maybe we should make it configurable with something like log( n ) or 10 as a default value? Moreover, lower values of maxFreq are less prone to cache pollution.

          Regarding your (2), I personally don't mind if it is a little slower on average. I would expect the get method to be slower with this impl but on the other hand ConcurrentLFUCache seems to provide no warranty that it will be able to evict entries as fast as they get inserted into the cache so I think this new impl is better.

          Show
          Adrien Grand added a comment - Hi hoss, thanks for bringing this up! put should return the previous value instead of returning the value that has been put into the cache. I don't like the evictDecay option, I assume it is here to prevent the most frequently used entry from being evicted in case all entries have a frequency >= maxFreq but on the other hand it makes every put operation run in O( n ) so maybe we should just remove it and add a message in the class javadocs to warn users that entries that have a frequency >= maxFreq are evicted according to a LRU policy. Maybe we should remove warmDecay as well, I understand it is here to try to prevent cache pollution but it makes the cache behave differently in case there are commits: if an entry is retrieved 5 times before a commit and 5 times after, it will be considered less frequently used than an entry that has been retrieved 8 times after the commit, this is counterintuitive. I think Entry.value and Entry.frequency don't need to be volatile? maxCacheSize - 1 is probably a too high default value for maxFreq. It can make doEviction (and consequently put) run in O( n ). Maybe we should make it configurable with something like log( n ) or 10 as a default value? Moreover, lower values of maxFreq are less prone to cache pollution. Regarding your (2), I personally don't mind if it is a little slower on average. I would expect the get method to be slower with this impl but on the other hand ConcurrentLFUCache seems to provide no warranty that it will be able to evict entries as fast as they get inserted into the cache so I think this new impl is better.
          Hide
          Adrien Grand added a comment -

          Chris, I tried to rework your patch in order to remove the decay options, make maxFreq configurable (with 10 as a default value) and share the statistics code with LRUCache (by adding a common MapCacheBase superclass).

          What do you think?

          Show
          Adrien Grand added a comment - Chris, I tried to rework your patch in order to remove the decay options, make maxFreq configurable (with 10 as a default value) and share the statistics code with LRUCache (by adding a common MapCacheBase superclass). What do you think?
          Adrien Grand made changes -
          Attachment SOLR-3393.patch [ 12544082 ]
          Hide
          Shawn Heisey added a comment -

          Adrien, thanks for looking at it and making it better. This is early in my experience with Java - I can still count the number of projects I've built myself on one hand. Also, there have been a number of changes to the entire cache system since I wrote the first patch, changes that I have not had a chance to review.

          I definitely like doing the decay only at warm time. I'm perfectly happy to have evictDecay yanked out. I didn't think of the decay at all, that was Yonik on SOLR-2906. I agreed with his reasons. I wonder if there might be a way to have the decay happen much less often – say after a configurable number of commits rather than for every commit. Also, I can't remember whether I kept the bitshift decay (dividing by two) or changed it to subtract one from the frequency. IMHO subtracting one would be better.

          I don't understand your first note about the put, and I can't take the time to re-read the code right now. On whether things should be volatile or not – I based all that on SOLR-2906, and I based SOLR-2906 on existing stuff. I don't completely understand what the implications are. If you do, awesome.

          On the default for maxFreq and how it might affect performance – again, I expect you've got more experience and can make a better determination.

          Hoss, I would be very surprised to learn than anyone was actually using the current implementation in 3.6.0 or the 4.0 alpha/beta. I still haven't had a chance to give it a serious trial in my own setup, and I wrote it! I think about that first attempt as similar to the first sort algorithm you ever get introduced to in a programming class, before they introduce recursion and tell you about quicksort.

          Show
          Shawn Heisey added a comment - Adrien, thanks for looking at it and making it better. This is early in my experience with Java - I can still count the number of projects I've built myself on one hand. Also, there have been a number of changes to the entire cache system since I wrote the first patch, changes that I have not had a chance to review. I definitely like doing the decay only at warm time. I'm perfectly happy to have evictDecay yanked out. I didn't think of the decay at all, that was Yonik on SOLR-2906 . I agreed with his reasons. I wonder if there might be a way to have the decay happen much less often – say after a configurable number of commits rather than for every commit. Also, I can't remember whether I kept the bitshift decay (dividing by two) or changed it to subtract one from the frequency. IMHO subtracting one would be better. I don't understand your first note about the put, and I can't take the time to re-read the code right now. On whether things should be volatile or not – I based all that on SOLR-2906 , and I based SOLR-2906 on existing stuff. I don't completely understand what the implications are. If you do, awesome. On the default for maxFreq and how it might affect performance – again, I expect you've got more experience and can make a better determination. Hoss, I would be very surprised to learn than anyone was actually using the current implementation in 3.6.0 or the 4.0 alpha/beta. I still haven't had a chance to give it a serious trial in my own setup, and I wrote it! I think about that first attempt as similar to the first sort algorithm you ever get introduced to in a programming class, before they introduce recursion and tell you about quicksort.
          Hide
          Shawn Heisey added a comment -

          Adrien,

          I've been looking at your patch, especially the warming code. I can't see anything in there that maintains the frequency values from the old cache to the new cache.

          With maxFreq of 10 and a cache size much larger (200, 1000, etc), there's no difference from the cache's perspective between something that has been requested 50 times versus something that has been requested 100 times. How did the maxFreq being related to the cache size make it slower?

          Show
          Shawn Heisey added a comment - Adrien, I've been looking at your patch, especially the warming code. I can't see anything in there that maintains the frequency values from the old cache to the new cache. With maxFreq of 10 and a cache size much larger (200, 1000, etc), there's no difference from the cache's perspective between something that has been requested 50 times versus something that has been requested 100 times. How did the maxFreq being related to the cache size make it slower?
          Hide
          Adrien Grand added a comment -

          I agreed with his reasons. I wonder if there might be a way to have the decay happen much less often – say after a configurable number of commits rather than for every commit.

          I not comfortable with applying evictDecay based on the commit rate while cache usage depends on the query rate: a read-only index would never benefit from it.

          I don't understand your first note about the put

          The SolrCache.put(key, value) method should return the previous value associated with key or null if key was not in the cache. Instead, LFUCache.put returned the value that has just been added to the cache.

          On whether things should be volatile or not – I based all that on SOLR-2906, and I based SOLR-2906 on existing stuff. I don't completely understand what the implications are. If you do, awesome.

          http://en.wikipedia.org/wiki/Volatile_variable#In_Java

          One of the use-cases of the volatile keyword is to make sure that you are actually reading the most up-to-date value of a variable. By not using this keyword, it could happen that two CPUs don't think that a variable has the same value (because of their caches). Brian Goetz has written a nice article on the volatile keyword in case you are interested in the features of volatile variables http://www.ibm.com/developerworks/java/library/j-jtp06197/index.html.

          We don't need it here because everything happens in synchronized blocks, which already ensure that you are reading the most up-to-date value.

          Show
          Adrien Grand added a comment - I agreed with his reasons. I wonder if there might be a way to have the decay happen much less often – say after a configurable number of commits rather than for every commit. I not comfortable with applying evictDecay based on the commit rate while cache usage depends on the query rate: a read-only index would never benefit from it. I don't understand your first note about the put The SolrCache.put(key, value) method should return the previous value associated with key or null if key was not in the cache. Instead, LFUCache.put returned the value that has just been added to the cache. On whether things should be volatile or not – I based all that on SOLR-2906 , and I based SOLR-2906 on existing stuff. I don't completely understand what the implications are. If you do, awesome. http://en.wikipedia.org/wiki/Volatile_variable#In_Java One of the use-cases of the volatile keyword is to make sure that you are actually reading the most up-to-date value of a variable. By not using this keyword, it could happen that two CPUs don't think that a variable has the same value (because of their caches). Brian Goetz has written a nice article on the volatile keyword in case you are interested in the features of volatile variables http://www.ibm.com/developerworks/java/library/j-jtp06197/index.html . We don't need it here because everything happens in synchronized blocks, which already ensure that you are reading the most up-to-date value.
          Hide
          Adrien Grand added a comment -

          I can't see anything in there that maintains the frequency values from the old cache to the new cache.

          You're right, I forgot to add a nocommit about it! This should be fixed with this new patch.

          With maxFreq of 10 and a cache size much larger (200, 1000, etc), there's no difference from the cache's perspective between something that has been requested 50 times versus something that has been requested 100 times.

          Their frequency is considered equal but they are sorted according to their access order, so only the least recently used has a chance to be evicted.

          How did the maxFreq being related to the cache size make it slower?

          This was because of the call to findLowestFrequency in doEviction. If one element has frequency=0 and all other ones have frequency=maxFreq, findLowestFrequency must inspect every slot. But this should be avoidable: since doEviction is only called inside put, there is no need to compute lowestFreq inside doEviction, put will always set lowestFreq=0 in the end.

          I also modified the patch to rename the previous LFUCache impl to ConcurrentLFUCache (similarly to FastLRUCache, I just didn't want to prefix with "Fast" since I think this is a bit misleading).

          Show
          Adrien Grand added a comment - I can't see anything in there that maintains the frequency values from the old cache to the new cache. You're right, I forgot to add a nocommit about it! This should be fixed with this new patch. With maxFreq of 10 and a cache size much larger (200, 1000, etc), there's no difference from the cache's perspective between something that has been requested 50 times versus something that has been requested 100 times. Their frequency is considered equal but they are sorted according to their access order, so only the least recently used has a chance to be evicted. How did the maxFreq being related to the cache size make it slower? This was because of the call to findLowestFrequency in doEviction. If one element has frequency=0 and all other ones have frequency=maxFreq, findLowestFrequency must inspect every slot. But this should be avoidable: since doEviction is only called inside put, there is no need to compute lowestFreq inside doEviction, put will always set lowestFreq=0 in the end. I also modified the patch to rename the previous LFUCache impl to ConcurrentLFUCache (similarly to FastLRUCache, I just didn't want to prefix with "Fast" since I think this is a bit misleading).
          Adrien Grand made changes -
          Attachment SOLR-3393.patch [ 12544211 ]
          Hide
          Yonik Seeley added a comment -

          ConcurrentLFUCache seems to provide no warranty that it will be able to evict entries as fast as they get inserted into the cache

          Correct, but the context in which this cache is used (things like filter caches, query caches, etc) ensure that this isn't really a problem in practice. And since it's a cache, holding an extra entry briefly really doesn't have any negative effects.

          I do think some sort of decay option is important, regardless of how it's implemented (and I never got a chance to look at the implementation here).

          Show
          Yonik Seeley added a comment - ConcurrentLFUCache seems to provide no warranty that it will be able to evict entries as fast as they get inserted into the cache Correct, but the context in which this cache is used (things like filter caches, query caches, etc) ensure that this isn't really a problem in practice. And since it's a cache, holding an extra entry briefly really doesn't have any negative effects. I do think some sort of decay option is important, regardless of how it's implemented (and I never got a chance to look at the implementation here).
          Hide
          Shawn Heisey added a comment -

          This was because of the call to findLowestFrequency in doEviction. If one element has frequency=0 and all other ones have frequency=maxFreq, findLowestFrequency must inspect every slot. But this should be avoidable: since doEviction is only called inside put, there is no need to compute lowestFreq inside doEviction, put will always set lowestFreq=0 in the end.

          If indeed I only used doEviction inside put, then lowestFreq would always be 0. Thinking generally, in case doEviction did get called anywhere else, perhaps that value should be tracked rather than computed every time. There's probably a way to quickly figure out the new value if the data structure connected to that frequency gets reduced to empty.

          If we eliminate the need to cycle through every unused frequency one by one, we should be able to keep maxFreq at the max cache size minus one, allowing for the greatest granularity under the current implementation.

          Like Yonik, I think having decay is important, but the last implementation was way too aggressive. My current thought: Only do it at warm time, and only do it after a configurable time period has passed from the previous decay or initial cache creation. Offer an additional option where it would happen after X commits. Default the decay to true and the time period to one hour. Apply the decay by subtracting one rather than doing a bitshift. This should keep things fairly predictable in the short term while also keeping the cache clean over the long term.

          Unit tests for the decay would probably use values <= 5000 milliseconds for the time period.

          Show
          Shawn Heisey added a comment - This was because of the call to findLowestFrequency in doEviction. If one element has frequency=0 and all other ones have frequency=maxFreq, findLowestFrequency must inspect every slot. But this should be avoidable: since doEviction is only called inside put, there is no need to compute lowestFreq inside doEviction, put will always set lowestFreq=0 in the end. If indeed I only used doEviction inside put, then lowestFreq would always be 0. Thinking generally, in case doEviction did get called anywhere else, perhaps that value should be tracked rather than computed every time. There's probably a way to quickly figure out the new value if the data structure connected to that frequency gets reduced to empty. If we eliminate the need to cycle through every unused frequency one by one, we should be able to keep maxFreq at the max cache size minus one, allowing for the greatest granularity under the current implementation. Like Yonik, I think having decay is important, but the last implementation was way too aggressive. My current thought: Only do it at warm time, and only do it after a configurable time period has passed from the previous decay or initial cache creation. Offer an additional option where it would happen after X commits. Default the decay to true and the time period to one hour. Apply the decay by subtracting one rather than doing a bitshift. This should keep things fairly predictable in the short term while also keeping the cache clean over the long term. Unit tests for the decay would probably use values <= 5000 milliseconds for the time period.
          Hide
          Hoss Man added a comment -

          assigning to myself to follow up with this for 4.1 – if someone wants to review/profile in time for 4.0 we can certainly still consider it.

          Show
          Hoss Man added a comment - assigning to myself to follow up with this for 4.1 – if someone wants to review/profile in time for 4.0 we can certainly still consider it.
          Hoss Man made changes -
          Assignee Hoss Man [ hossman ]
          Fix Version/s 4.1 [ 12321141 ]
          Fix Version/s 4.0 [ 12322551 ]
          Adrien Grand made changes -
          Link This issue depends on SOLR-3830 [ SOLR-3830 ]
          Hide
          Shawn Heisey added a comment -

          Followup while going through all my old issues:

          I like Adrien's changes to my patch in general, but I still think a slow default decay (subtract 1 from each frequency) on autowarm is a good idea. In the interests of making sure it doesn't affect performance much, require a minimum time period to elapse before decaying again.

          IMHO my previous LFU implementation (the one that actually got committed) is total crap and this should just completely replace it.

          Show
          Shawn Heisey added a comment - Followup while going through all my old issues: I like Adrien's changes to my patch in general, but I still think a slow default decay (subtract 1 from each frequency) on autowarm is a good idea. In the interests of making sure it doesn't affect performance much, require a minimum time period to elapse before decaying again. IMHO my previous LFU implementation (the one that actually got committed) is total crap and this should just completely replace it.
          Hide
          Shawn Heisey added a comment -

          A recent commit adding a large number of @Override annotations resulted in a lot of manual work to apply this patch to a 4x checkout. I have done this work, and I have also added a "minDecayIntervalMs" option, defaulting to 300000, or five minutes.

          Show
          Shawn Heisey added a comment - A recent commit adding a large number of @Override annotations resulted in a lot of manual work to apply this patch to a 4x checkout. I have done this work, and I have also added a "minDecayIntervalMs" option, defaulting to 300000, or five minutes.
          Hide
          Shawn Heisey added a comment - - edited

          New patches against recent trunk and 4x checkouts that also implement a slow decay. Because two of the files have a getSource method that SVN changes on checkout, applying the patch with standard linux patch tools is problematic.

          SVN aware patch utilities (I tried with TortoiseSVN) seem to apply with no problems.

          Show
          Shawn Heisey added a comment - - edited New patches against recent trunk and 4x checkouts that also implement a slow decay. Because two of the files have a getSource method that SVN changes on checkout, applying the patch with standard linux patch tools is problematic. SVN aware patch utilities (I tried with TortoiseSVN) seem to apply with no problems.
          Shawn Heisey made changes -
          Attachment SOLR-3393-4x-withdecay.patch [ 12562724 ]
          Attachment SOLR-3393-trunk-withdecay.patch [ 12562725 ]
          Hide
          Shawn Heisey added a comment -

          I would like to deprecate ConcurrentLFUCache (patch renames old LFUCache implementation to ConcurrentLFUCache) in this patch for 4.x and eliminate it entirely for trunk, but I will leave that decision up to the committer who takes this on. The existing patches do not do this. I also realized that I have not included CHANGES.TXT. I have the following suggestion for that:

          4x:
          SOLR-3393: New LFUCache implementation with much better performance. Deprecate old implementation and rename it to ConcurrentLFUCache.

          trunk:
          SOLR-3393: New LFUCache implementation with much better performance. Removed old implementation.

          Show
          Shawn Heisey added a comment - I would like to deprecate ConcurrentLFUCache (patch renames old LFUCache implementation to ConcurrentLFUCache) in this patch for 4.x and eliminate it entirely for trunk, but I will leave that decision up to the committer who takes this on. The existing patches do not do this. I also realized that I have not included CHANGES.TXT. I have the following suggestion for that: 4x: SOLR-3393 : New LFUCache implementation with much better performance. Deprecate old implementation and rename it to ConcurrentLFUCache. trunk: SOLR-3393 : New LFUCache implementation with much better performance. Removed old implementation.
          Hide
          Shawn Heisey added a comment -

          When I built the 4x patch, I accidentally checked out a specific old revision instead of the newest. The patch will apply successfully to the most recent revision as long as the SVN URL glitch is dealt with first, or you use svn-aware tools.

          Show
          Shawn Heisey added a comment - When I built the 4x patch, I accidentally checked out a specific old revision instead of the newest. The patch will apply successfully to the most recent revision as long as the SVN URL glitch is dealt with first, or you use svn-aware tools.
          Hide
          Shawn Heisey added a comment -

          N.B.: subversion appears to have gotten the 'patch' subcommand in version 1.7 - CentOS 6 has v1.6. I always find that Redhat's stable offerings are quite outdated.

          Show
          Shawn Heisey added a comment - N.B.: subversion appears to have gotten the 'patch' subcommand in version 1.7 - CentOS 6 has v1.6. I always find that Redhat's stable offerings are quite outdated.
          Steve Rowe made changes -
          Fix Version/s 4.2 [ 12323893 ]
          Fix Version/s 4.1 [ 12321141 ]
          Robert Muir made changes -
          Fix Version/s 4.3 [ 12324128 ]
          Fix Version/s 4.2 [ 12323893 ]
          Hide
          Shawn Heisey added a comment -

          I'd like to see this committed, my prior implementation is just terrible. Hoss, what do you think about my suggestion to put a slow decay back in? Would you like to continue on this issue, or should I take it over?

          Show
          Shawn Heisey added a comment - I'd like to see this committed, my prior implementation is just terrible. Hoss, what do you think about my suggestion to put a slow decay back in? Would you like to continue on this issue, or should I take it over?
          Hide
          Hoss Man added a comment -

          Shit, i'm sorry man i totally droped the ball on this.

          i haven't looked at your latest patch, but catching up on the comments you general API compatibility approach sounds fine to me – you may want to follow up with Adrien or Yonik to sanity check your decay code, because i didn't fully understand what you guys were talking about the first time it came up.

          Show
          Hoss Man added a comment - Shit, i'm sorry man i totally droped the ball on this. i haven't looked at your latest patch, but catching up on the comments you general API compatibility approach sounds fine to me – you may want to follow up with Adrien or Yonik to sanity check your decay code, because i didn't fully understand what you guys were talking about the first time it came up.
          Hoss Man made changes -
          Assignee Hoss Man [ hossman ] Shawn Heisey [ elyograg ]
          Gavin made changes -
          Link This issue depends on SOLR-3830 [ SOLR-3830 ]
          Gavin made changes -
          Link This issue depends upon SOLR-3830 [ SOLR-3830 ]
          Uwe Schindler made changes -
          Fix Version/s 4.4 [ 12324324 ]
          Fix Version/s 4.3 [ 12324128 ]
          Hide
          Steve Rowe added a comment -

          Bulk move 4.4 issues to 4.5 and 5.0

          Show
          Steve Rowe added a comment - Bulk move 4.4 issues to 4.5 and 5.0
          Steve Rowe made changes -
          Fix Version/s 5.0 [ 12321664 ]
          Fix Version/s 4.5 [ 12324743 ]
          Fix Version/s 4.4 [ 12324324 ]
          Adrien Grand made changes -
          Fix Version/s 4.6 [ 12325000 ]
          Fix Version/s 5.0 [ 12321664 ]
          Fix Version/s 4.5 [ 12324743 ]
          Uwe Schindler made changes -
          Fix Version/s 4.7 [ 12325573 ]
          Fix Version/s 4.6 [ 12325000 ]
          David Smiley made changes -
          Fix Version/s 4.8 [ 12326254 ]
          Fix Version/s 4.7 [ 12325573 ]
          Hide
          Uwe Schindler added a comment -

          Move issue to Solr 4.9.

          Show
          Uwe Schindler added a comment - Move issue to Solr 4.9.
          Uwe Schindler made changes -
          Fix Version/s 4.9 [ 12326731 ]
          Fix Version/s 5.0 [ 12321664 ]
          Fix Version/s 4.8 [ 12326254 ]

            People

            • Assignee:
              Shawn Heisey
              Reporter:
              Shawn Heisey
            • Votes:
              1 Vote for this issue
              Watchers:
              6 Start watching this issue

              Dates

              • Created:
                Updated:

                Development