Details

    • Type: Sub-task Sub-task
    • Status: Closed
    • Priority: Minor Minor
    • Resolution: Fixed
    • Affects Version/s: 3.4
    • Fix Version/s: 3.6, 4.0-ALPHA
    • Component/s: search
    • Labels:
      None

      Description

      Implement an LFU (Least Frequently Used) cache as the first step towards a full ARC cache

      1. ConcurrentLFUCache.java
        14 kB
        Shawn Heisey
      2. LFUCache.java
        9 kB
        Shawn Heisey
      3. TestLFUCache.java
        9 kB
        Shawn Heisey
      4. SOLR-2906.patch
        34 kB
        Shawn Heisey
      5. SOLR-2906.patch
        34 kB
        Shawn Heisey
      6. SOLR-2906.patch
        35 kB
        Shawn Heisey
      7. SOLR-2906.patch
        35 kB
        Erick Erickson
      8. SOLR-2906.patch
        37 kB
        Erick Erickson
      9. SOLR-2906.patch
        37 kB
        Erick Erickson
      10. SOLR-2906.patch
        38 kB
        Shawn Heisey
      11. SOLR-2906.patch
        45 kB
        Erick Erickson

        Issue Links

          Activity

          Hide
          Shawn Heisey added a comment -

          Here is the first crack at an LFU implementation. There's some weirdness with negative numbers in the statistics, and I'm not sure that eviction and warming are working right, but I am having trouble getting a test environment fully operational.

          Show
          Shawn Heisey added a comment - Here is the first crack at an LFU implementation. There's some weirdness with negative numbers in the statistics, and I'm not sure that eviction and warming are working right, but I am having trouble getting a test environment fully operational.
          Hide
          Shawn Heisey added a comment -

          I used branch_3x to create the above files. I haven't even looked at trunk.

          Show
          Shawn Heisey added a comment - I used branch_3x to create the above files. I haven't even looked at trunk.
          Hide
          Shawn Heisey added a comment -

          Evictions definitely don't seem to be working right. I finally got a benchmark script going. I watched as the size of the filterCache climbed to the maximum size of 64. On the next insert, it was suddenly only 7 entries, and the eviction counter had incremented from 290 to 348. That seems really aggressive. I seem to have done something wrong.

          Once I got through with the benchmark script, I did a commit, at that moment the size was 50 out of 64. After warming (autowarmCount 16), the cache size was 12, and both the hits and lookups were -12 (negative).

          Show
          Shawn Heisey added a comment - Evictions definitely don't seem to be working right. I finally got a benchmark script going. I watched as the size of the filterCache climbed to the maximum size of 64. On the next insert, it was suddenly only 7 entries, and the eviction counter had incremented from 290 to 348. That seems really aggressive. I seem to have done something wrong. Once I got through with the benchmark script, I did a commit, at that moment the size was 50 out of 64. After warming (autowarmCount 16), the cache size was 12, and both the hits and lookups were -12 (negative).
          Hide
          Shawn Heisey added a comment -

          I will fully admit that I built the new cache type from the old code without really understanding what the code was doing, and now I am out of my depth.

          Show
          Shawn Heisey added a comment - I will fully admit that I built the new cache type from the old code without really understanding what the code was doing, and now I am out of my depth.
          Hide
          Shawn Heisey added a comment -

          I've been trying to find my bug. Looking back at the original LRU implementation, I have no idea how it's working.

          When a CacheEntry is created in the LRU code, one of the values sent in is an incremented stats.accessCounter, which gets called lastAccessed in the new object. When it is later used in markAndSweep, it is used in simple math along with the number of items that we want to keep/remove. This is very confusing, and I can't see how it could ever work. It might be that part of the code is simply skipped because of the way the math happens to work out.

          When I changed it around to use hitcounts, those counts are also used in the previously mentioned simple math, and I believe that results in some very weird behavior, such as removing most (or possibly all) of the cache entries.

          It appears that this idea is a lot more complicated than I originally thought, and that the current code needs to be at least partially rewritten.

          Show
          Shawn Heisey added a comment - I've been trying to find my bug. Looking back at the original LRU implementation, I have no idea how it's working. When a CacheEntry is created in the LRU code, one of the values sent in is an incremented stats.accessCounter, which gets called lastAccessed in the new object. When it is later used in markAndSweep, it is used in simple math along with the number of items that we want to keep/remove. This is very confusing, and I can't see how it could ever work. It might be that part of the code is simply skipped because of the way the math happens to work out. When I changed it around to use hitcounts, those counts are also used in the previously mentioned simple math, and I believe that results in some very weird behavior, such as removing most (or possibly all) of the cache entries. It appears that this idea is a lot more complicated than I originally thought, and that the current code needs to be at least partially rewritten.
          Hide
          Yonik Seeley added a comment -

          I've been trying to find my bug. Looking back at the original LRU implementation, I have no idea how it's working.

          Heh... it is pretty complex, and I did try to add a ton of comments because of that.
          The basic idea is that I wanted to avoid an O(n*log( n )) solution of sorting everything and then discarding the lowest.
          In my testing, it seems to work, and we often just need to take a singe O( n ) pass to evict all the entries we need.

          The comment at the top is the most important to understanding how it works:

              // if we want to keep at least 1000 entries, then timestamps of
              // current through current-1000 are guaranteed not to be the oldest (but that does
              // not mean there are 1000 entries in that group... it's acutally anywhere between
              // 1 and 1000).
              // Also, if we want to remove 500 entries, then
              // oldestEntry through oldestEntry+500 are guaranteed to be
              // removed (however many there are there).
          

          But really, it seems like you should disregard all the algorithmic stuff in LRU when implementing LFU.
          If you think you see a bug in the existing LRU stuff, you're going to have to spell it out for me a bit more.

          Show
          Yonik Seeley added a comment - I've been trying to find my bug. Looking back at the original LRU implementation, I have no idea how it's working. Heh... it is pretty complex, and I did try to add a ton of comments because of that. The basic idea is that I wanted to avoid an O(n*log( n )) solution of sorting everything and then discarding the lowest. In my testing, it seems to work, and we often just need to take a singe O( n ) pass to evict all the entries we need. The comment at the top is the most important to understanding how it works: // if we want to keep at least 1000 entries, then timestamps of // current through current-1000 are guaranteed not to be the oldest (but that does // not mean there are 1000 entries in that group... it's acutally anywhere between // 1 and 1000). // Also, if we want to remove 500 entries, then // oldestEntry through oldestEntry+500 are guaranteed to be // removed (however many there are there). But really, it seems like you should disregard all the algorithmic stuff in LRU when implementing LFU. If you think you see a bug in the existing LRU stuff, you're going to have to spell it out for me a bit more.
          Hide
          Shawn Heisey added a comment -

          But really, it seems like you should disregard all the algorithmic stuff in LRU when implementing LFU. If you think you see a bug in the existing LRU stuff, you're going to have to spell it out for me a bit more.

          I can't actually say that there is a bug, but I have to say that I'm really confused (in the LRU code) by what lastAccessed(Copy) actually is and how it works, and what the following code pieces from markAndSweep are doing with it, since wantToKeep and wantToRemove are entry counts:

          long thisEntry = ce.lastAccessedCopy;
            <snip>
          if (thisEntry > newestEntry - wantToKeep) {
            <snip>
          } else if (thisEntry < oldestEntry + wantToRemove) { // entry in bottom group?
          
          Show
          Shawn Heisey added a comment - But really, it seems like you should disregard all the algorithmic stuff in LRU when implementing LFU. If you think you see a bug in the existing LRU stuff, you're going to have to spell it out for me a bit more. I can't actually say that there is a bug, but I have to say that I'm really confused (in the LRU code) by what lastAccessed(Copy) actually is and how it works, and what the following code pieces from markAndSweep are doing with it, since wantToKeep and wantToRemove are entry counts: long thisEntry = ce.lastAccessedCopy; <snip> if (thisEntry > newestEntry - wantToKeep) { <snip> } else if (thisEntry < oldestEntry + wantToRemove) { // entry in bottom group?
          Hide
          Yonik Seeley added a comment -

          since wantToKeep and wantToRemove are entry counts

          Yes, it seems like mixing units. But the key to understanding it is that lastAccessed isn't a real timestamp, but just a counter that is incremented for every access. This means that if the latest "timestamp" is 5000, and we know we want to keep at least 1000 entries, then if we run across any timestamps greater than 4000 that it's guaranteed to be in the top 1000 entries and we don't need to consider it further. One can just reverse that logic to figure out that an entry is definitely in the bottom group and we should immediately discard it.

          Show
          Yonik Seeley added a comment - since wantToKeep and wantToRemove are entry counts Yes, it seems like mixing units. But the key to understanding it is that lastAccessed isn't a real timestamp, but just a counter that is incremented for every access. This means that if the latest "timestamp" is 5000, and we know we want to keep at least 1000 entries, then if we run across any timestamps greater than 4000 that it's guaranteed to be in the top 1000 entries and we don't need to consider it further. One can just reverse that logic to figure out that an entry is definitely in the bottom group and we should immediately discard it.
          Hide
          Shawn Heisey added a comment -

          Would it be possible to adapt the lastAccessed shortcut to LFU, or would it be simply best to remove that section of code?

          Show
          Shawn Heisey added a comment - Would it be possible to adapt the lastAccessed shortcut to LFU, or would it be simply best to remove that section of code?
          Hide
          Shawn Heisey added a comment -

          Now markAndSweep builds a treeset of the least used items, then uses the treeset to evict entries.

          This is almost guaranteed to be an inefficient way to do things. If it works, I leave optimization to the experts.

          Show
          Shawn Heisey added a comment - Now markAndSweep builds a treeset of the least used items, then uses the treeset to evict entries. This is almost guaranteed to be an inefficient way to do things. If it works, I leave optimization to the experts.
          Hide
          Shawn Heisey added a comment -

          I've renamed the user-facing class to LFUCache and created a test program based on the LRU version. The tests are failing, though. So far I can't figure out why.

          Show
          Shawn Heisey added a comment - I've renamed the user-facing class to LFUCache and created a test program based on the LRU version. The tests are failing, though. So far I can't figure out why.
          Hide
          Shawn Heisey added a comment -

          I've re-added lastAccessed to the class, as a tiebreaker when hitcount is equal.

          The test method prints out leastUsedItems and mostUsedItems. Somehow, item number 50 is included in both.

          Show
          Shawn Heisey added a comment - I've re-added lastAccessed to the class, as a tiebreaker when hitcount is equal. The test method prints out leastUsedItems and mostUsedItems. Somehow, item number 50 is included in both.
          Hide
          Shawn Heisey added a comment -

          All known bugs found and fixed, unit test looks correct and passes. This was created against branch_3x, but trunk probably won't be much different.

          IMHO, ready for review and possible inclusion. The javadoc and other comments were reviewed and modified, but not closely.

          Show
          Shawn Heisey added a comment - All known bugs found and fixed, unit test looks correct and passes. This was created against branch_3x, but trunk probably won't be much different. IMHO, ready for review and possible inclusion. The javadoc and other comments were reviewed and modified, but not closely.
          Hide
          Simon Willnauer added a comment -

          IMHO, ready for review and possible inclusion. The javadoc and other comments were reviewed and modified, but not closely.

          shawn, is it possible to upload a diff file (patch). If you are using a svn checkout make sure on add all new files (svn add) and then run svn diff > SOLR-2906.patch from the top level directory. This makes it easier to see what changed and we only have to apply a single file to test & review your changes. there is also a wiki for this: http://wiki.apache.org/lucene-java/HowToContribute (see How to create a patch)

          Show
          Simon Willnauer added a comment - IMHO, ready for review and possible inclusion. The javadoc and other comments were reviewed and modified, but not closely. shawn, is it possible to upload a diff file (patch). If you are using a svn checkout make sure on add all new files (svn add) and then run svn diff > SOLR-2906 .patch from the top level directory. This makes it easier to see what changed and we only have to apply a single file to test & review your changes. there is also a wiki for this: http://wiki.apache.org/lucene-java/HowToContribute (see How to create a patch)
          Hide
          Shawn Heisey added a comment -

          shawn, is it possible to upload a diff file (patch).

          These are all new files, no files changed. "svn diff" returns nothing.

          Show
          Shawn Heisey added a comment - shawn, is it possible to upload a diff file (patch). These are all new files, no files changed. "svn diff" returns nothing.
          Hide
          Shawn Heisey added a comment -

          I figured out what I did wrong. You have to 'svn add' the files before you can 'svn diff'

          Show
          Shawn Heisey added a comment - I figured out what I did wrong. You have to 'svn add' the files before you can 'svn diff'
          Hide
          Shawn Heisey added a comment -

          Something odd happens with the filterCache. When things are first starting off, the cache size and the number of inserts don't match up. It's usually off by 10, with more in the inserts. This doesn't seem to happen with the other cache types, also using LFU.

          Show
          Shawn Heisey added a comment - Something odd happens with the filterCache. When things are first starting off, the cache size and the number of inserts don't match up. It's usually off by 10, with more in the inserts. This doesn't seem to happen with the other cache types, also using LFU.
          Hide
          Yonik Seeley added a comment -

          The cache size and the number of inserts don't match up. It's usually off by 10

          Do you have any type of warming (autowarming or statically configured)? Statistics are counted differently during warming.

          Show
          Yonik Seeley added a comment - The cache size and the number of inserts don't match up. It's usually off by 10 Do you have any type of warming (autowarming or statically configured)? Statistics are counted differently during warming.
          Hide
          Shawn Heisey added a comment - - edited

          The only static warming I have is an all docs search with a sort parameter (and no filter query), which precaches my sort. I do have autowarming configured, but this behavior happens from initial solr startup. Things seem to behave correctly with warming - size is autowarmCount, inserts are zero.

          FastLRUCache behavior:
          queryResultCache: size is one higher than inserts
          documentCache: size is one higher than inserts
          filterCache: size and inserts are identical

          LFUCache behavior:
          queryResultCache: size is one higher than inserts
          documentCache: size is one higher than inserts
          filterCache: inserts is higher than size by a variable amount

          I've seen 10 (on two different runs) and 2 (on the most recent run) as the difference between inserts and size.

          The FastLRUCache behavior is seen on my production servers with production queries, the LFUCache behavior is on my test server with a benchmark script providing the queries. I suppose there might be something weird about my canned queries that makes the filterCache behave differently, but the original source of the queries was a production Solr log at level INFO.

          Show
          Shawn Heisey added a comment - - edited The only static warming I have is an all docs search with a sort parameter (and no filter query), which precaches my sort. I do have autowarming configured, but this behavior happens from initial solr startup. Things seem to behave correctly with warming - size is autowarmCount, inserts are zero. FastLRUCache behavior: queryResultCache: size is one higher than inserts documentCache: size is one higher than inserts filterCache: size and inserts are identical LFUCache behavior: queryResultCache: size is one higher than inserts documentCache: size is one higher than inserts filterCache: inserts is higher than size by a variable amount I've seen 10 (on two different runs) and 2 (on the most recent run) as the difference between inserts and size. The FastLRUCache behavior is seen on my production servers with production queries, the LFUCache behavior is on my test server with a benchmark script providing the queries. I suppose there might be something weird about my canned queries that makes the filterCache behave differently, but the original source of the queries was a production Solr log at level INFO.
          Hide
          Shawn Heisey added a comment -

          I might have figured out the problem, and if I have, the cache code is fine. I just checked the log from my most recent run and have found that there are two errors from invalid filter queries. I think this means that when a filter is invalid, inserts gets incremented but size doesn't.

          Show
          Shawn Heisey added a comment - I might have figured out the problem, and if I have, the cache code is fine. I just checked the log from my most recent run and have found that there are two errors from invalid filter queries. I think this means that when a filter is invalid, inserts gets incremented but size doesn't.
          Hide
          Shawn Heisey added a comment -

          I can't reproduce it with LFUCache or FastLRUCache by manually sending invalid queries, so that's the wrong idea.

          Show
          Shawn Heisey added a comment - I can't reproduce it with LFUCache or FastLRUCache by manually sending invalid queries, so that's the wrong idea.
          Hide
          Shawn Heisey added a comment -

          Possibly false alarm. Although I still do not know what causes the discrepancy between inserts and size on the filter cache, I can confirm that exactly the same thing happens when I change it to FastLRUCache, restart Solr, and fire up the benchmarking script.

          Show
          Shawn Heisey added a comment - Possibly false alarm. Although I still do not know what causes the discrepancy between inserts and size on the filter cache, I can confirm that exactly the same thing happens when I change it to FastLRUCache, restart Solr, and fire up the benchmarking script.
          Hide
          Shawn Heisey added a comment -

          Updated patch. One of the bugs I had to fix was in the least/most items methods, so that I added new items to the TreeSet before removing old ones, because the one it just added might have the same hitcount as entries already present. Without checking the new entry, I couldn't know which entry was the right one to remove.

          This change reverts it to a remove then add when the hitCounts are different in the right direction. When they are equal, it still does the add before the remove. By reducing the size of the set before adding a new member whenever possible, there is a possibility it can go faster.

          Show
          Shawn Heisey added a comment - Updated patch. One of the bugs I had to fix was in the least/most items methods, so that I added new items to the TreeSet before removing old ones, because the one it just added might have the same hitcount as entries already present. Without checking the new entry, I couldn't know which entry was the right one to remove. This change reverts it to a remove then add when the hitCounts are different in the right direction. When they are equal, it still does the add before the remove. By reducing the size of the set before adding a new member whenever possible, there is a possibility it can go faster.
          Hide
          Shawn Heisey added a comment -

          I finally got a chance to do some testing in production. I have two distributed index chains, both running 3.5.0 with this patch and the one from SOLR-1972 applied. The chains are updated independently, there is no replication. It's not a truly definitive test, because my queries are load balanced between the two chains and I do have some hardware discrepancies. I cannot create a valid test environment to compare the two caches as they should be compared, with identical queries going to two completely identical servers.

          On average, commits made on chain A where filterCache is set to LFU and the servers have 48GB of RAM happen faster than those on chain B, with FastLRU and 64GB of RAM. One of the two servers on chain A has slightly faster processors than its counterpart on chain B – 2.83 GHz vs. 2.66 GHz. The other two servers on both chains have 2.5 GHz processors.

          I suspect that most of the potential gains that the LFU algorithm might be able to provide are swallowed by the very inefficient implementation.

          If anyone has some thoughts for me to pursue, I will be happy to do so, but I am out of my own ideas. I hope the patch will be committed. It could use a lot of optimization and there's probably cosmetic cleanup to do.

          Show
          Shawn Heisey added a comment - I finally got a chance to do some testing in production. I have two distributed index chains, both running 3.5.0 with this patch and the one from SOLR-1972 applied. The chains are updated independently, there is no replication. It's not a truly definitive test, because my queries are load balanced between the two chains and I do have some hardware discrepancies. I cannot create a valid test environment to compare the two caches as they should be compared, with identical queries going to two completely identical servers. On average, commits made on chain A where filterCache is set to LFU and the servers have 48GB of RAM happen faster than those on chain B, with FastLRU and 64GB of RAM. One of the two servers on chain A has slightly faster processors than its counterpart on chain B – 2.83 GHz vs. 2.66 GHz. The other two servers on both chains have 2.5 GHz processors. I suspect that most of the potential gains that the LFU algorithm might be able to provide are swallowed by the very inefficient implementation. If anyone has some thoughts for me to pursue, I will be happy to do so, but I am out of my own ideas. I hope the patch will be committed. It could use a lot of optimization and there's probably cosmetic cleanup to do.
          Hide
          Shawn Heisey added a comment -

          Some additional info: The index is composed of six large shard cores and a small one, running on two servers. The total index size on each server (CentOS 6, java 1.6.0_29) is about 60GB. Solr/Jetty has an 8GB heap.

          Show
          Shawn Heisey added a comment - Some additional info: The index is composed of six large shard cores and a small one, running on two servers. The total index size on each server (CentOS 6, java 1.6.0_29) is about 60GB. Solr/Jetty has an 8GB heap.
          Hide
          Erick Erickson added a comment -

          What do the collective people who actually know the code think about this patch? From my perspective, the LFU is a classic alternative to LRU and seems like a fine addition. If there are no objections, I'll volunteer to commit this.

          Shawn:
          This is the kind of thing that could use two things:
          1> text for CHANGES.txt describing the new functionality. If you could attach that as a separate patch file (or just e-mail it to me) that would be great. That file changes rapidly enough that I like to add the text at check-in time, it seems easier.

          2> Adding a description to the Wiki since this is new functionality. Perhaps in http://wiki.apache.org/solr/SolrCaching?

          Show
          Erick Erickson added a comment - What do the collective people who actually know the code think about this patch? From my perspective, the LFU is a classic alternative to LRU and seems like a fine addition. If there are no objections, I'll volunteer to commit this. Shawn: This is the kind of thing that could use two things: 1> text for CHANGES.txt describing the new functionality. If you could attach that as a separate patch file (or just e-mail it to me) that would be great. That file changes rapidly enough that I like to add the text at check-in time, it seems easier. 2> Adding a description to the Wiki since this is new functionality. Perhaps in http://wiki.apache.org/solr/SolrCaching?
          Hide
          Yonik Seeley added a comment -

          IMO, this is appropriate for trunk. If we want to commit to 3x, we should mark as experimental so we can change the default functionality if desired.

          One concern I have with straight LFU is the lack of any kind of time sensitivity. For example, I ran some batch job that accessed the same filters a million times, and now they are stuck in the cache even though they haven't been used for hours or days. Perhaps one way to handle this would be to do something like count>>=2 to everything when a cleaning out old entries.

          I also wonder if it's worth keeping lastAccessed. It's only valuable for breaking ties and I don't know how much that's actually needed.

          Anyway, +1 to commit since this won't be used in the example solrconfig by default (and hence we can speed things up and tweak the algorithm before it possibly does get used by default).

          Show
          Yonik Seeley added a comment - IMO, this is appropriate for trunk. If we want to commit to 3x, we should mark as experimental so we can change the default functionality if desired. One concern I have with straight LFU is the lack of any kind of time sensitivity. For example, I ran some batch job that accessed the same filters a million times, and now they are stuck in the cache even though they haven't been used for hours or days. Perhaps one way to handle this would be to do something like count>>=2 to everything when a cleaning out old entries. I also wonder if it's worth keeping lastAccessed. It's only valuable for breaking ties and I don't know how much that's actually needed. Anyway, +1 to commit since this won't be used in the example solrconfig by default (and hence we can speed things up and tweak the algorithm before it possibly does get used by default).
          Hide
          Shawn Heisey added a comment -

          I have added some text to CHANGES.TXT under 3.6. Like before, my patch is against branch_3x.

          Yonik, you may be right about lastAccessed. I was striving for correctness on this first pass, but perhaps it's not worthwhile to care about that too much, just let Java's default functionality figure out which ones to evict.

          Show
          Shawn Heisey added a comment - I have added some text to CHANGES.TXT under 3.6. Like before, my patch is against branch_3x. Yonik, you may be right about lastAccessed. I was striving for correctness on this first pass, but perhaps it's not worthwhile to care about that too much, just let Java's default functionality figure out which ones to evict.
          Hide
          Erick Erickson added a comment -

          We need to keep lastAccessed, removing it introduces a bug. The CacheEntry.compareTo method looks like this:

          public int compareTo(CacheEntry<K, V> that) {
          if (this.hitsCopy == that.hitsCopy) {
          if (this.lastAccessedCopy == that.lastAccessedCopy)

          { return 0; }

          return this.lastAccessedCopy < that.lastAccessedCopy ? 1 : -1;
          }
          return this.hitsCopy < that.hitsCopy ? 1 : -1;
          }

          which is guaranteed to return non-zero unless the the exact same object is being compared since lastAccessed(Copy) is monotonically increasing.

          If we remove the lastAccessed, then any two elements having the same number of hits compare as equal. Which also means that tree insertions overwrite each other in methods like getMost/LeastUsedItems. I don't know of any cheaper amounts of overhead to carry along to prevent this.

          I've made a few, mostly cosmetic changes while trying to understand the code, I'll check it in shortly.

          Show
          Erick Erickson added a comment - We need to keep lastAccessed, removing it introduces a bug. The CacheEntry.compareTo method looks like this: public int compareTo(CacheEntry<K, V> that) { if (this.hitsCopy == that.hitsCopy) { if (this.lastAccessedCopy == that.lastAccessedCopy) { return 0; } return this.lastAccessedCopy < that.lastAccessedCopy ? 1 : -1; } return this.hitsCopy < that.hitsCopy ? 1 : -1; } which is guaranteed to return non-zero unless the the exact same object is being compared since lastAccessed(Copy) is monotonically increasing. If we remove the lastAccessed, then any two elements having the same number of hits compare as equal. Which also means that tree insertions overwrite each other in methods like getMost/LeastUsedItems. I don't know of any cheaper amounts of overhead to carry along to prevent this. I've made a few, mostly cosmetic changes while trying to understand the code, I'll check it in shortly.
          Hide
          Erick Erickson added a comment -

          Mostly cosmetic changes:

          Changed acceptableLimit to acceptableSize to keep it named consistently

          Formatted all the files

          Implemented Yonik's aging suggestion (but no tests, there doesn't seem to be a clean way to implement a test here without creating debug-only code).

          I'm not wholly convinced that dividing by 4 is the right thing to do here; it'll tend to flatten all the entries making removal somewhat arbitrary as after a few passes anything with hits in the low range will collapse to zero. That said, though, since the little adventure with lastAccessed, all entries with the same number of hits will be treated as LRU so I guess it works.

          Marked code as experimental

          Commented out some debugging code

          Show
          Erick Erickson added a comment - Mostly cosmetic changes: Changed acceptableLimit to acceptableSize to keep it named consistently Formatted all the files Implemented Yonik's aging suggestion (but no tests, there doesn't seem to be a clean way to implement a test here without creating debug-only code). I'm not wholly convinced that dividing by 4 is the right thing to do here; it'll tend to flatten all the entries making removal somewhat arbitrary as after a few passes anything with hits in the low range will collapse to zero. That said, though, since the little adventure with lastAccessed, all entries with the same number of hits will be treated as LRU so I guess it works. Marked code as experimental Commented out some debugging code
          Hide
          Yonik Seeley added a comment - - edited

          Actually, I was thinking count >>= 1 (divide by 2).
          We could make it optional (timeDecay=true), but my gut feeling says it should be on by default.

          Show
          Yonik Seeley added a comment - - edited Actually, I was thinking count >>= 1 (divide by 2). We could make it optional (timeDecay=true), but my gut feeling says it should be on by default.
          Hide
          Erick Erickson added a comment -

          Updated patch that divides by 2 and adds a unit test for aging out.

          Shawn:

          Could you add in the optional time decay as Yonik suggests? I agree that it seems like the right thing is to have this on by default. At that point, I think it'll be ready to check in. We can add documentation as we can.

          We could also check it in as is and raise another JIRA.

          Show
          Erick Erickson added a comment - Updated patch that divides by 2 and adds a unit test for aging out. Shawn: Could you add in the optional time decay as Yonik suggests? I agree that it seems like the right thing is to have this on by default. At that point, I think it'll be ready to check in. We can add documentation as we can. We could also check it in as is and raise another JIRA.
          Hide
          Shawn Heisey added a comment -

          Could you add in the optional time decay as Yonik suggests? I agree that it seems like the right thing is to have this on by default. At that point, I think it'll be ready to check in. We can add documentation as we can.

          I've looked at what Yonik has said and cannot figure out what I'd have to do. I'm not completely ignorant, but there is a lot that I don't know. I am amazed I was able to get this put together at all.

          Show
          Shawn Heisey added a comment - Could you add in the optional time decay as Yonik suggests? I agree that it seems like the right thing is to have this on by default. At that point, I think it'll be ready to check in. We can add documentation as we can. I've looked at what Yonik has said and cannot figure out what I'd have to do. I'm not completely ignorant, but there is a lot that I don't know. I am amazed I was able to get this put together at all.
          Hide
          Erick Erickson added a comment - - edited

          Right, there's a lot of code to wrap your head around and it can be bewildering. And compared to what you've already done, this is easy....

          But lots of things are surprisingly easy in Solr, except when they're really hard <G>.

          This is just about allowing another parameter to be specified in the declaration in solrconfig.xml, similar to initialSize, autowarmCount, etc. cleanupThread is probably a good exemplar.

          Take a look at LFUCache.java.init for how all of the other ones are parsed.

          Then just pass that through to the ConcurrentLFUCache, conditionally doing the "ce.hits.set(ce.hitsCopy >>> 1);" line. Defaulting the timeDecay=true if not present.

          So this should be relatively few lines of actual code, making some automated tests will actually take more time I suspect, as well as any Wiki documentation if you're feeling ambitious....

          Show
          Erick Erickson added a comment - - edited Right, there's a lot of code to wrap your head around and it can be bewildering. And compared to what you've already done, this is easy.... But lots of things are surprisingly easy in Solr, except when they're really hard <G>. This is just about allowing another parameter to be specified in the declaration in solrconfig.xml, similar to initialSize, autowarmCount, etc. cleanupThread is probably a good exemplar. Take a look at LFUCache.java.init for how all of the other ones are parsed. Then just pass that through to the ConcurrentLFUCache, conditionally doing the "ce.hits.set(ce.hitsCopy >>> 1);" line. Defaulting the timeDecay=true if not present. So this should be relatively few lines of actual code, making some automated tests will actually take more time I suspect, as well as any Wiki documentation if you're feeling ambitious....
          Hide
          Shawn Heisey added a comment -

          I must be dense. I can figure out how to add the timeDecay option, but I can't figure out what section of code to enable/disable based on the value of timeDecay. I've gone as far as doing a diff on my Nov 24th patch and the Dec 20th patch from Erick. (doing diffs on diffs ... the world is going to explode!) The only differences I can see between the two is in whitespace/formatting.

          Show
          Shawn Heisey added a comment - I must be dense. I can figure out how to add the timeDecay option, but I can't figure out what section of code to enable/disable based on the value of timeDecay. I've gone as far as doing a diff on my Nov 24th patch and the Dec 20th patch from Erick. (doing diffs on diffs ... the world is going to explode!) The only differences I can see between the two is in whitespace/formatting.
          Hide
          Erick Erickson added a comment -

          Here's what I had in mind, at least I think this will do but all I've done is insured that the code compiles and the current LFU test suite runs.

          Look in the diff for "timeDecay".

          This still needs some proof that the new parameter comes through from a schema file. Let me know if that presents a problem or if you can't get 'round to it, I might have some time over Christmas.

          I think maybe you were under the impression that this had already been done and were looking for it to be in the code already?

          Show
          Erick Erickson added a comment - Here's what I had in mind, at least I think this will do but all I've done is insured that the code compiles and the current LFU test suite runs. Look in the diff for "timeDecay". This still needs some proof that the new parameter comes through from a schema file. Let me know if that presents a problem or if you can't get 'round to it, I might have some time over Christmas. I think maybe you were under the impression that this had already been done and were looking for it to be in the code already?
          Hide
          Shawn Heisey added a comment -

          You're right, I had thought it was already done. Now that I see what you've done, and looked up what >>> does, it makes sense, but when I read what Yonik was saying I had no idea.

          Added a note to CHANGES.txt, created what is hopefully the final diff.

          Show
          Shawn Heisey added a comment - You're right, I had thought it was already done. Now that I see what you've done, and looked up what >>> does, it makes sense, but when I read what Yonik was saying I had no idea. Added a note to CHANGES.txt, created what is hopefully the final diff.
          Hide
          Erick Erickson added a comment -

          This should be the final patch. Added the stuff to actually get the parameter from solrconfig "timeDecay" which ages out the cache entries as we've discussed. Added tests to insure that it gets through from the config file.

          Shawn: If you'd add some data to the Wiki about this new parameter, that would be a good thing.

          If nobody objects, I'll probably check this in in the next couple of days. Since they're all new files, the patch will apply to both trunk and 3x cleanly.

          Show
          Erick Erickson added a comment - This should be the final patch. Added the stuff to actually get the parameter from solrconfig "timeDecay" which ages out the cache entries as we've discussed. Added tests to insure that it gets through from the config file. Shawn: If you'd add some data to the Wiki about this new parameter, that would be a good thing. If nobody objects, I'll probably check this in in the next couple of days. Since they're all new files, the patch will apply to both trunk and 3x cleanly.
          Hide
          Erick Erickson added a comment - - edited

          Fixed in 3x at r: 1230744
          Fixed on trunk at r: 1230790 NOTE: r 1230741 was messed up.

          Shawn:
          Did you ever update the Wiki with this new functionality? That'd be awesome....

          Show
          Erick Erickson added a comment - - edited Fixed in 3x at r: 1230744 Fixed on trunk at r: 1230790 NOTE: r 1230741 was messed up. Shawn: Did you ever update the Wiki with this new functionality? That'd be awesome....
          Hide
          Shawn Heisey added a comment -

          Did you ever update the Wiki with this new functionality? That'd be awesome....

          Yes, I added LFUCache and the timeDecay option to the SolrCaching Wiki page.

          Show
          Shawn Heisey added a comment - Did you ever update the Wiki with this new functionality? That'd be awesome.... Yes, I added LFUCache and the timeDecay option to the SolrCaching Wiki page.
          Hide
          Erick Erickson added a comment -

          Cool!

          On Fri, Jan 27, 2012 at 2:16 PM, Shawn Heisey (Commented) (JIRA)

          Show
          Erick Erickson added a comment - Cool! On Fri, Jan 27, 2012 at 2:16 PM, Shawn Heisey (Commented) (JIRA)

            People

            • Assignee:
              Erick Erickson
              Reporter:
              Shawn Heisey
            • Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development