Solr
  1. Solr
  2. SOLR-667

Alternate LRUCache implementation

    Details

    • Type: New Feature New Feature
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: 1.3
    • Fix Version/s: 1.4
    • Component/s: search
    • Labels:
      None

      Description

      The only available SolrCache i.e LRUCache is based on LinkedHashMap which has get() also synchronized. This can cause severe bottlenecks for faceted search. Any alternate implementation which can be faster/better must be considered.

      1. SOLR-667-updates.patch
        3 kB
        Yonik Seeley
      2. SOLR-667-alternate.patch
        16 kB
        Yonik Seeley
      3. SOLR-667-alternate.patch
        21 kB
        Yonik Seeley
      4. SOLR-667.patch
        13 kB
        Noble Paul
      5. SOLR-667.patch
        13 kB
        Noble Paul
      6. SOLR-667.patch
        16 kB
        Noble Paul
      7. SOLR-667.patch
        17 kB
        Noble Paul
      8. SOLR-667.patch
        17 kB
        Noble Paul
      9. SOLR-667.patch
        19 kB
        Shalin Shekhar Mangar
      10. SOLR-667.patch
        1 kB
        Noble Paul
      11. SOLR-667.patch
        6 kB
        Noble Paul
      12. SOLR-667.patch
        6 kB
        Noble Paul
      13. SOLR-667.patch
        1 kB
        Noble Paul
      14. SOLR-667.patch
        10 kB
        Noble Paul
      15. SOLR-667.patch
        7 kB
        Noble Paul
      16. SOLR-667.patch
        9 kB
        Noble Paul
      17. ConcurrentLRUCache.java
        4 kB
        Noble Paul
      18. ConcurrentLRUCache.java
        4 kB
        Noble Paul
      19. ConcurrentLRUCache.java
        9 kB
        Yonik Seeley

        Activity

        Hide
        Noble Paul added a comment - - edited

        A POC implementation based on ConcurrentHashMap

        • Gets are free
        • Puts are free till it touches the high water mark . It is expensive (very expensive) after the high watermark .
        • To lighten the load on put an extra thread is employed to do a concurrent mark and sweep
        • there is a high-water-mark and a low-water-mark . The extra thread cleans anything if low-water-mark is crossed. Put must removes if level crosses high-water-mark
        Show
        Noble Paul added a comment - - edited A POC implementation based on ConcurrentHashMap Gets are free Puts are free till it touches the high water mark . It is expensive (very expensive) after the high watermark . To lighten the load on put an extra thread is employed to do a concurrent mark and sweep there is a high-water-mark and a low-water-mark . The extra thread cleans anything if low-water-mark is crossed. Put must removes if level crosses high-water-mark
        Hide
        Noble Paul added a comment -

        If somebody can review the implementation we can add another cache implementation to Solr.

        Show
        Noble Paul added a comment - If somebody can review the implementation we can add another cache implementation to Solr.
        Hide
        Yonik Seeley added a comment -

        Gets are free

        Not entirely... there are a few memory barriers that make things thread safe, so I wouldn't call it "free" (since this branched off of another issue where some had the idea that one could get away without any sort of locks or memory barriers).

        It's a good approach in genera, and should scale better with many CPUs under very high lookup load. But I'm not sure that it should use a separate cleaner thread... and if it does, I don't think it should be scheduled.

        After we got those details worked out, then we'd need a SolrCache implementation that uses it. Given where we are in the release cycle (and that the cache contention issue is only affecting 1 person that I've seen), I think this should want until after 1.3

        Show
        Yonik Seeley added a comment - Gets are free Not entirely... there are a few memory barriers that make things thread safe, so I wouldn't call it "free" (since this branched off of another issue where some had the idea that one could get away without any sort of locks or memory barriers). It's a good approach in genera, and should scale better with many CPUs under very high lookup load. But I'm not sure that it should use a separate cleaner thread... and if it does, I don't think it should be scheduled. After we got those details worked out, then we'd need a SolrCache implementation that uses it. Given where we are in the release cycle (and that the cache contention issue is only affecting 1 person that I've seen), I think this should want until after 1.3
        Hide
        Fuad Efendi added a comment -

        ...safety, where nothing bad ever happens to an object.

        When SOLR adds object to cache or remove it from cache it does not change it, it manipulates with internal arrays of pointers to objects (which are probably atomic, but I don't know such JVM & GC internals in-depth...)

        Looks heavy with TreeSet...

        Show
        Fuad Efendi added a comment - ...safety, where nothing bad ever happens to an object. When SOLR adds object to cache or remove it from cache it does not change it, it manipulates with internal arrays of pointers to objects (which are probably atomic, but I don't know such JVM & GC internals in-depth...) Looks heavy with TreeSet...
        Hide
        Noble Paul added a comment -

        It's a good approach in genera, and should scale better with many CPUs under very high lookup load. But I'm not sure that it should use a separate cleaner thread... and if it does, I don't think it should be scheduled.

        Thanks for the comments. I agree with you . I devised this approach because some user reported that heavy cache lookups are slowing things down for him. The cost benefit analysis is yet to be studied. Separate cleaner thread is optional in the implementation . I am yet to study the cost of sorting over a hundred thousand entries. Do you recommend that the cleaner thread just keep running forever? That is fine, so there should be a sleep for the thread?

        BTW is the executorservice very expensive?

        I did not provide a SolrCache implementation because that is going to be drown the approach in details.

        I think this should want until after 1.3

        True. This is not marked for 1.3. But this can definitely live as a patch and anyone who needs it would benefit from it

        Show
        Noble Paul added a comment - It's a good approach in genera, and should scale better with many CPUs under very high lookup load. But I'm not sure that it should use a separate cleaner thread... and if it does, I don't think it should be scheduled. Thanks for the comments. I agree with you . I devised this approach because some user reported that heavy cache lookups are slowing things down for him. The cost benefit analysis is yet to be studied. Separate cleaner thread is optional in the implementation . I am yet to study the cost of sorting over a hundred thousand entries. Do you recommend that the cleaner thread just keep running forever? That is fine, so there should be a sleep for the thread? BTW is the executorservice very expensive? I did not provide a SolrCache implementation because that is going to be drown the approach in details. I think this should want until after 1.3 True. This is not marked for 1.3. But this can definitely live as a patch and anyone who needs it would benefit from it
        Hide
        Noble Paul added a comment -

        Fuad. You cannot trivialize concurrent programming so easily. Whatever
        we have commented are from our experience (and wisdom) . There is a
        price to pay it. Java could have easily eliminated the
        java.util.concurrent package by using 'volatile' everywhere and no
        need of AtomicInteger,AtomimcLong etc. So they are there for a reason

        BTW. Using TreeSet is not 'heavy' . It is the right tool for right
        purpose. If you need a sorted set that is best

        On Thu, Jul 31, 2008 at 9:58 PM, Fuad Efendi (JIRA) <jira@apache.org> wrote:


        --Noble Paul

        Show
        Noble Paul added a comment - Fuad. You cannot trivialize concurrent programming so easily. Whatever we have commented are from our experience (and wisdom) . There is a price to pay it. Java could have easily eliminated the java.util.concurrent package by using 'volatile' everywhere and no need of AtomicInteger,AtomimcLong etc. So they are there for a reason BTW. Using TreeSet is not 'heavy' . It is the right tool for right purpose. If you need a sorted set that is best On Thu, Jul 31, 2008 at 9:58 PM, Fuad Efendi (JIRA) <jira@apache.org> wrote: – --Noble Paul
        Hide
        Fuad Efendi added a comment -

        Paul,

        I have never ever suggested to use 'volatile' 'to avoid synchronization' for concurrent programming. I only noticed some extremely stupid code where SOLR uses _double_synchronization and AtomicLong inside:

          public synchronized Object put(Object key, Object value) {
            if (state == State.LIVE) {
              stats.inserts.incrementAndGet();
            }
        
            synchronized (map) {
              // increment local inserts regardless of state???
              // it does make it more consistent with the current size...
              inserts++;
              return map.put(key,value);
            }
          }
        

        Each tool has an area of applicability, and even ConcurrentHashMap just slightly intersects with SOLR needs; SOLR does not need 'consistent view at a point in time' on cached objects.

        'volatile' is part of Java Specs, and implemented differently by different vendors. I use volatile (instead of more expensive AtomicLong) only and only to prevent JVM HotSpot Optimizer from some not-applicable staff...

        Show
        Fuad Efendi added a comment - Paul, I have never ever suggested to use 'volatile' 'to avoid synchronization' for concurrent programming. I only noticed some extremely stupid code where SOLR uses _double_synchronization and AtomicLong inside: public synchronized Object put( Object key, Object value) { if (state == State.LIVE) { stats.inserts.incrementAndGet(); } synchronized (map) { // increment local inserts regardless of state??? // it does make it more consistent with the current size... inserts++; return map.put(key,value); } } Each tool has an area of applicability, and even ConcurrentHashMap just slightly intersects with SOLR needs; SOLR does not need 'consistent view at a point in time' on cached objects. 'volatile' is part of Java Specs, and implemented differently by different vendors. I use volatile (instead of more expensive AtomicLong) only and only to prevent JVM HotSpot Optimizer from some not-applicable staff...
        Hide
        Yonik Seeley added a comment -

        I only noticed some extremely stupid code where SOLR uses _double_synchronization and AtomicLong inside:

        A simple typo I think... a remnant from way back when changing what object was being synchronized on. That's why I like explicit synchronization rather than adding it to a method signature (easier to miss). I just fixed this to be

          public Object put(Object key, Object value) {
            synchronized (map) {
              if (state == State.LIVE) {
                stats.inserts.incrementAndGet();
              }
        
              // increment local inserts regardless of state???
              // it does make it more consistent with the current size...
              inserts++;
              return map.put(key,value);
            }
          }
        
        Show
        Yonik Seeley added a comment - I only noticed some extremely stupid code where SOLR uses _double_synchronization and AtomicLong inside: A simple typo I think... a remnant from way back when changing what object was being synchronized on. That's why I like explicit synchronization rather than adding it to a method signature (easier to miss). I just fixed this to be public Object put( Object key, Object value) { synchronized (map) { if (state == State.LIVE) { stats.inserts.incrementAndGet(); } // increment local inserts regardless of state??? // it does make it more consistent with the current size... inserts++; return map.put(key,value); } }
        Hide
        Fuad Efendi added a comment -

        Thanks Yonik, I even guess that in some cases synchronization is faster than sun.misc.Unsafe.compareAndSwapLong(this, valueOffset, expect, update);

            public final long incrementAndGet() {
                for (;;) {
                    long current = get();
                    long next = current + 1;
                    if (compareAndSet(current, next))
                        return next;
                }
            }
        
        • extremal level of safety with some level of concurrency... Do we need exact value for 'stats.inserts' (if it is not synchronized)?

        It can be 'long' inside synchronized block...

        Show
        Fuad Efendi added a comment - Thanks Yonik, I even guess that in some cases synchronization is faster than sun.misc.Unsafe.compareAndSwapLong(this, valueOffset, expect, update); public final long incrementAndGet() { for (;;) { long current = get(); long next = current + 1; if (compareAndSet(current, next)) return next; } } extremal level of safety with some level of concurrency... Do we need exact value for 'stats.inserts' (if it is not synchronized)? It can be 'long' inside synchronized block...
        Hide
        Yonik Seeley added a comment -

        If you don't need the synchronized block, then an atomic variable for "inserts" (for example) would be a big win.
        But if you have the synchronized block anyway, it's probably faster to just expand it's scope if the operations to be done are simple.

        Show
        Yonik Seeley added a comment - If you don't need the synchronized block, then an atomic variable for "inserts" (for example) would be a big win. But if you have the synchronized block anyway, it's probably faster to just expand it's scope if the operations to be done are simple.
        Hide
        Noble Paul added a comment -

        my findings on a simple perf test with no contention (single thread)

        The code is there in the main()

        cache size 1 million

        with Hashmap

        • time taken for 1 million inserts = 2019ms
        • time taken for 1 million gets = 625
        • time taken for cleanup = 345ms

        with ConcurrenthashHashMap

        • time taken for 1 million inserts = 2437(roughly 20% slower than hashmap but small in absolute numbers)
        • time taken for 1 million gets = 393ms (actually faster than simple HashMap )
        • time taken for cleanup = 298ms (actually faster)

        other observations
        The extra thread may not be be necessary . The unlucky put() may take around .25 secs to .5secs for a cache size of 1 million .
        If we keep the value of (highHaterMark -lowWaterMark) value very high cleanups will be infrequent

        Show
        Noble Paul added a comment - my findings on a simple perf test with no contention (single thread) The code is there in the main() cache size 1 million with Hashmap time taken for 1 million inserts = 2019ms time taken for 1 million gets = 625 time taken for cleanup = 345ms with ConcurrenthashHashMap time taken for 1 million inserts = 2437(roughly 20% slower than hashmap but small in absolute numbers) time taken for 1 million gets = 393ms (actually faster than simple HashMap ) time taken for cleanup = 298ms (actually faster) other observations The extra thread may not be be necessary . The unlucky put() may take around .25 secs to .5secs for a cache size of 1 million . If we keep the value of (highHaterMark -lowWaterMark) value very high cleanups will be infrequent
        Hide
        Noble Paul added a comment -

        Full SolrCache implementation
        not tested

        the initialization parameters are same us the current LRUCache.

        Show
        Noble Paul added a comment - Full SolrCache implementation not tested the initialization parameters are same us the current LRUCache.
        Hide
        Noble Paul added a comment -

        this is the right patch

        Show
        Noble Paul added a comment - this is the right patch
        Hide
        Noble Paul added a comment -

        with testcase

        Show
        Noble Paul added a comment - with testcase
        Hide
        Antony Bowesman added a comment -

        I have also been considering a concurrent LRU cache for my own application and seeing this isse made me think about it again. Wouldn't one option be to use a ReentrantReadWriteLock to synchronise the map rather than complete synchronisation on the map for both readers and writers. Although that does not give a free get() it would at least allow concurrent get and still be able to use the LinkedHashMap and would not require the extra thread. Not sure if SOLR is java 1.5, but if not you could still use Doug Lea's concurrent package for pre 1.5 Java.

        Show
        Antony Bowesman added a comment - I have also been considering a concurrent LRU cache for my own application and seeing this isse made me think about it again. Wouldn't one option be to use a ReentrantReadWriteLock to synchronise the map rather than complete synchronisation on the map for both readers and writers. Although that does not give a free get() it would at least allow concurrent get and still be able to use the LinkedHashMap and would not require the extra thread. Not sure if SOLR is java 1.5, but if not you could still use Doug Lea's concurrent package for pre 1.5 Java.
        Hide
        Noble Paul added a comment - - edited

        The patch contains a implementation which uses the java 5 features (ConcurrentHashMap) .It is better than using a separate Lock

        Show
        Noble Paul added a comment - - edited The patch contains a implementation which uses the java 5 features (ConcurrentHashMap) .It is better than using a separate Lock
        Hide
        Noble Paul added a comment -

        name change and some refactoring

        Show
        Noble Paul added a comment - name change and some refactoring
        Hide
        Yonik Seeley added a comment -

        Here is a prototype of an idea I've had for a while for an efficient concurrent LRU cache.
        It is completely untested... consider it more "code as design". It should feature faster cleaning - O( n ) when it works well.

        In addition to low and high water marks, it adds the concept of an "acceptable" water mark. A cleaning phase will try to go to the low water mark, but will be considered successful if it hits the acceptable water mark.

        This is coupled with a multi-pass cleaning phase. From the comments:

            // if we want to keep at least 1000 entries, then timestamps of
            // current through current-1000 are guaranteed not to be the oldest!
            // Also, if we want to remove 500 entries, then
            // oldestEntry through oldestEntry+500 are guaranteed to be
            // removed.
        

        The oldestEntry and newestEntry in the set of entries currently being considered is recorded for each phase. Entries that are new enough such that they are guaranteed to be kept are immediately removed from consideration, and entries that are old enough such that they are guaranteed to be removed are immediately removed (no sorting necessary). After 2 phases of this (configurable) and we still haven't removed enough entries, a priority queue is used to find the oldest entries out of those remaining.

        There are undoubtedly some other tricks we can use, but this was the best I could come up with for now.

        Show
        Yonik Seeley added a comment - Here is a prototype of an idea I've had for a while for an efficient concurrent LRU cache. It is completely untested... consider it more "code as design". It should feature faster cleaning - O( n ) when it works well. In addition to low and high water marks, it adds the concept of an "acceptable" water mark. A cleaning phase will try to go to the low water mark, but will be considered successful if it hits the acceptable water mark. This is coupled with a multi-pass cleaning phase. From the comments: // if we want to keep at least 1000 entries, then timestamps of // current through current-1000 are guaranteed not to be the oldest! // Also, if we want to remove 500 entries, then // oldestEntry through oldestEntry+500 are guaranteed to be // removed. The oldestEntry and newestEntry in the set of entries currently being considered is recorded for each phase. Entries that are new enough such that they are guaranteed to be kept are immediately removed from consideration, and entries that are old enough such that they are guaranteed to be removed are immediately removed (no sorting necessary). After 2 phases of this (configurable) and we still haven't removed enough entries, a priority queue is used to find the oldest entries out of those remaining. There are undoubtedly some other tricks we can use, but this was the best I could come up with for now.
        Hide
        Noble Paul added a comment -

        Looks good. This has a lot in common with my approach. The doClean() is done far more efficiently in your implementation . I can improve mine with your cleanup code (if you think it is fine)

        Show
        Noble Paul added a comment - Looks good. This has a lot in common with my approach. The doClean() is done far more efficiently in your implementation . I can improve mine with your cleanup code (if you think it is fine)
        Hide
        Yonik Seeley added a comment -

        I can improve mine with your cleanup code (if you think it is fine)

        +1

        I'd also include the manual tracking of size() that mine did... the ConcurrentHashMap.size() doesn't look fast.

        Another thing to think about : pass an optional Executor in the constructor instead of creating a cleaning thread... and if it's null, it means "do it in the foreground". That would add flexibility and the ability to avoid one thread per cache if desired.

        Show
        Yonik Seeley added a comment - I can improve mine with your cleanup code (if you think it is fine) +1 I'd also include the manual tracking of size() that mine did... the ConcurrentHashMap.size() doesn't look fast. Another thing to think about : pass an optional Executor in the constructor instead of creating a cleaning thread... and if it's null, it means "do it in the foreground". That would add flexibility and the ability to avoid one thread per cache if desired.
        Hide
        Fuad Efendi added a comment -

        Paul, Yonik, thanks for your efforts; BTW 'Concurrent'HashMap uses spinloops for 'safe' updates in order to avoid synchronization (instead of giving up CPU cycles); there are always cases when it is not faster that simple HashMap with synchronization.

        LingPipe uses different approach, see last comment at SOLR-665.

        Also, why are you in-a-loop with LRU? LFU is logically better.

        +1 and thanks for sharing.

        Show
        Fuad Efendi added a comment - Paul, Yonik, thanks for your efforts; BTW 'Concurrent'HashMap uses spinloops for 'safe' updates in order to avoid synchronization (instead of giving up CPU cycles); there are always cases when it is not faster that simple HashMap with synchronization. LingPipe uses different approach, see last comment at SOLR-665 . Also, why are you in-a-loop with LRU? LFU is logically better. +1 and thanks for sharing.
        Hide
        Noble Paul added a comment -

        My implementation just uses a Map<K,V> internally. If we can get a Map implemenatation that is faster than ConcurrentHashMap (and concurrent) we can replace it (after seeing the performance).

        Show
        Noble Paul added a comment - My implementation just uses a Map<K,V> internally. If we can get a Map implemenatation that is faster than ConcurrentHashMap (and concurrent) we can replace it (after seeing the performance).
        Hide
        Noble Paul added a comment -

        borrowed some ideas from yonik's impl.
        Probably this is a good enough first cut.

        Show
        Noble Paul added a comment - borrowed some ideas from yonik's impl. Probably this is a good enough first cut.
        Hide
        Shalin Shekhar Mangar added a comment -

        Yonik – do you think this is good enough to go in now? Probably some users can try it out and report their experiences if we commit it early. I can take this up if you want.

        Show
        Shalin Shekhar Mangar added a comment - Yonik – do you think this is good enough to go in now? Probably some users can try it out and report their experiences if we commit it early. I can take this up if you want.
        Hide
        Shalin Shekhar Mangar added a comment -
        1. Added comments in the code
        2. Fixed a few concurrency issues

        I'll commit this shortly.

        Show
        Shalin Shekhar Mangar added a comment - Added comments in the code Fixed a few concurrency issues I'll commit this shortly.
        Hide
        Shalin Shekhar Mangar added a comment -

        Committed revision 708656.

        Thanks Fuad, Noble and Yonik!

        Show
        Shalin Shekhar Mangar added a comment - Committed revision 708656. Thanks Fuad, Noble and Yonik!
        Hide
        Todd Feak added a comment -

        Huge thanks on this one. This was one of the bottlenecks I've seen previously. For apples to apples load tests, this more then doubled my overall throughput.

        I do notice a sort of "pulsing" in the responses. It appears that everything flies along, but on occasion everything piles up for a second, then starts going again. This leads to a few response times that are over 1 second, but the average is way down closer to 20ms. Is this the cleanup involved when hitting a high-water mark?

        Overall, it's a huge improvement.

        Show
        Todd Feak added a comment - Huge thanks on this one. This was one of the bottlenecks I've seen previously. For apples to apples load tests, this more then doubled my overall throughput. I do notice a sort of "pulsing" in the responses. It appears that everything flies along, but on occasion everything piles up for a second, then starts going again. This leads to a few response times that are over 1 second, but the average is way down closer to 20ms. Is this the cleanup involved when hitting a high-water mark? Overall, it's a huge improvement.
        Hide
        Noble Paul added a comment -

        This leads to a few response times that are over 1 second, but the average is way down closer to 20ms.

        yeah you are right.
        there is one more feature in ConcurrentLRUCache whcih enables cleanups to be done in a new thread .FastLRUCache is not using it yet .I'll give a patch soon . This will take care of that 1 sec delay.

        Show
        Noble Paul added a comment - This leads to a few response times that are over 1 second, but the average is way down closer to 20ms. yeah you are right. there is one more feature in ConcurrentLRUCache whcih enables cleanups to be done in a new thread .FastLRUCache is not using it yet .I'll give a patch soon . This will take care of that 1 sec delay.
        Hide
        Shalin Shekhar Mangar added a comment -

        For apples to apples load tests, this more then doubled my overall throughput.

        Very good to hear that!

        Is this the cleanup involved when hitting a high-water mark?

        Yes, a cleanup is attempted when the size crosses the high watermark ('size' config parameter). It is done in two stages. In the first stage, least recently used items are evicted. If, after the first stage, the cache size is still greater than 'acceptableSize' config parameter (default is 0.95*maxSize), the second stage takes over. The second stage is more intensive and tries to bring down the cache size to the 'minSize' config parameter (default is 0.9*maxSize).

        Note that the cleanup is done in the same thread which calls a put on the cache, hence the 'pulsing' that you are seeing. The cache implementation supports using a separate cleanup thread too, however it is not used currently. We still need to evaluate the best way to use it and how much it can help.

        Show
        Shalin Shekhar Mangar added a comment - For apples to apples load tests, this more then doubled my overall throughput. Very good to hear that! Is this the cleanup involved when hitting a high-water mark? Yes, a cleanup is attempted when the size crosses the high watermark ('size' config parameter). It is done in two stages. In the first stage, least recently used items are evicted. If, after the first stage, the cache size is still greater than 'acceptableSize' config parameter (default is 0.95*maxSize), the second stage takes over. The second stage is more intensive and tries to bring down the cache size to the 'minSize' config parameter (default is 0.9*maxSize). Note that the cleanup is done in the same thread which calls a put on the cache, hence the 'pulsing' that you are seeing. The cache implementation supports using a separate cleanup thread too, however it is not used currently. We still need to evaluate the best way to use it and how much it can help.
        Hide
        Noble Paul added a comment -

        run cleanup in new thread

        Show
        Noble Paul added a comment - run cleanup in new thread
        Hide
        Todd Feak added a comment -

        I'm not sure if that helped out. I haven't run a profiler yet ,but I think the "pulsing" (for lack of a better term) is caused by something else.

        Here's why... I used the new FastLRUCache only for my Document cache in my latest test. The document cache size is large enough to hold all of the documents in the test set, helping focus on cache behavior. The warming query is enough to get all documents into the cache on startup. So, the cache is essentially as full as it's gonna get. 100% hit rate, and not growing. Yet, it still exhibits this pulsing. Could it be associated with the overhead of maintaining the least recently used entries?

        The introduction of the background thread didn't address this. It also didn't appear to speed things up, in fact it dropped overall throughput a bit, though still better then they old LRUCache.

        Show
        Todd Feak added a comment - I'm not sure if that helped out. I haven't run a profiler yet ,but I think the "pulsing" (for lack of a better term) is caused by something else. Here's why... I used the new FastLRUCache only for my Document cache in my latest test. The document cache size is large enough to hold all of the documents in the test set, helping focus on cache behavior. The warming query is enough to get all documents into the cache on startup. So, the cache is essentially as full as it's gonna get. 100% hit rate, and not growing. Yet, it still exhibits this pulsing. Could it be associated with the overhead of maintaining the least recently used entries? The introduction of the background thread didn't address this. It also didn't appear to speed things up, in fact it dropped overall throughput a bit, though still better then they old LRUCache.
        Hide
        Noble Paul added a comment -

        Could it be associated with the overhead of maintaining the least recently used entries?

        The overhead is ~= 0. It just has to increment an AtomicLong everytime you do a get() .I suspect the 'pulsing' may be because of GC pauses. enable GC logging and you will know

        Show
        Noble Paul added a comment - Could it be associated with the overhead of maintaining the least recently used entries? The overhead is ~= 0. It just has to increment an AtomicLong everytime you do a get() .I suspect the 'pulsing' may be because of GC pauses. enable GC logging and you will know
        Hide
        Yonik Seeley added a comment -

        Todd, what's the used size of your Document cache?
        I'll review this latest incarnation of ConcurrentLRUCache to see if there's anything that might cause this.

        Show
        Yonik Seeley added a comment - Todd, what's the used size of your Document cache? I'll review this latest incarnation of ConcurrentLRUCache to see if there's anything that might cause this.
        Hide
        Yonik Seeley added a comment -

        Some minor updates:

        • fix thread saftey issue in finalizer (background thread may never see stop being set)
        • fix tracking of size in put() to only increment if oldValue != null
        • optimize cleaning check in put() since it will be called for every put until the size is back down.
        Show
        Yonik Seeley added a comment - Some minor updates: fix thread saftey issue in finalizer (background thread may never see stop being set) fix tracking of size in put() to only increment if oldValue != null optimize cleaning check in put() since it will be called for every put until the size is back down.
        Hide
        Yonik Seeley added a comment -

        Todd: have you verified that the hit rate stays at 100% (no new inserts, no new evictions, etc)? If so, it might just be GC as Noble suggests. Bigger heaps often increase GC pauses.

        Show
        Yonik Seeley added a comment - Todd: have you verified that the hit rate stays at 100% (no new inserts, no new evictions, etc)? If so, it might just be GC as Noble suggests. Bigger heaps often increase GC pauses.
        Hide
        Todd Feak added a comment -

        I'll hook up profiling and see if my GC overhead has changed, or is seeing big peaks. Should be later today.

        My document cache is only 4000 for that test, but only about 3300 documents are in it. I wanted to focus on the overhead of getting things out of LRUCache, as in production we get >97% hit rates. I verified it's at 100% hit rate (warming fills it).

        Show
        Todd Feak added a comment - I'll hook up profiling and see if my GC overhead has changed, or is seeing big peaks. Should be later today. My document cache is only 4000 for that test, but only about 3300 documents are in it. I wanted to focus on the overhead of getting things out of LRUCache, as in production we get >97% hit rates. I verified it's at 100% hit rate (warming fills it).
        Hide
        Shalin Shekhar Mangar added a comment -

        Committed revision 709188.

        Thanks Yonik!

        Show
        Shalin Shekhar Mangar added a comment - Committed revision 709188. Thanks Yonik!
        Hide
        Yonik Seeley added a comment -

        The markAndSweep logic in the current code didn't replicate the logic I gave in my example ConcurrentLRUCache, and it can remove a lot more than it should.

        Specifically, my algorithm broke the results into 3 groups:

        • those documents that are guaranteed to be in the top group (most recently accessed)
        • those documents guaranteed to be in the bottom group (immediately discard)
        • those documents where you can't tell.

        The current code reversed this logic, assuming that one can remove everything that is not in the top group. This isn't valid though, as lastAccess isn't uniform (and thus the size of the top group could be as small as 1).

        Show
        Yonik Seeley added a comment - The markAndSweep logic in the current code didn't replicate the logic I gave in my example ConcurrentLRUCache, and it can remove a lot more than it should. Specifically, my algorithm broke the results into 3 groups: those documents that are guaranteed to be in the top group (most recently accessed) those documents guaranteed to be in the bottom group (immediately discard) those documents where you can't tell. The current code reversed this logic, assuming that one can remove everything that is not in the top group. This isn't valid though, as lastAccess isn't uniform (and thus the size of the top group could be as small as 1).
        Hide
        Todd Feak added a comment -

        I ran with a profiler and I'm not seeing any bursts of garbage collection. It's at a steady ~2%, with no major collections occurring (which is great!). However, the use of the profiler also slows things down about 10-20 % which seems to be enough that the pulsing goes away. I believe the pulsing may be some sort of limit of the testing I'm doing on a limited local environment. I'll include the patch in my current Solr app and do a more accurate comparison with our production level load tests to see if it still exists. Though it may be a while before I can provide feedback from that one, as getting machines allocated for the heavy load testing can take a bit.

        As I said before, this is a huge improvement. Thanks for all the work on this one.

        Show
        Todd Feak added a comment - I ran with a profiler and I'm not seeing any bursts of garbage collection. It's at a steady ~2%, with no major collections occurring (which is great!). However, the use of the profiler also slows things down about 10-20 % which seems to be enough that the pulsing goes away. I believe the pulsing may be some sort of limit of the testing I'm doing on a limited local environment. I'll include the patch in my current Solr app and do a more accurate comparison with our production level load tests to see if it still exists. Though it may be a while before I can provide feedback from that one, as getting machines allocated for the heavy load testing can take a bit. As I said before, this is a huge improvement. Thanks for all the work on this one.
        Hide
        Yonik Seeley added a comment -

        I'll take a crack at updating the patch such that too many entries aren't removed.

        Show
        Yonik Seeley added a comment - I'll take a crack at updating the patch such that too many entries aren't removed.
        Hide
        Noble Paul added a comment -

        The current code reversed this logic,

        I missed the point . I guess this post made it clear . As you mentioned it should be a 3 step cleanup

        Show
        Noble Paul added a comment - The current code reversed this logic, I missed the point . I guess this post made it clear . As you mentioned it should be a 3 step cleanup
        Hide
        Shalin Shekhar Mangar added a comment -

        I forgot to add license headers to the three source files for this issue. I'll hold off adding them lest it breaks patches that you guys are working on.

        Show
        Shalin Shekhar Mangar added a comment - I forgot to add license headers to the three source files for this issue. I'll hold off adding them lest it breaks patches that you guys are working on.
        Hide
        Noble Paul added a comment -

        yonik's suggestions implemented

        Show
        Noble Paul added a comment - yonik's suggestions implemented
        Hide
        Noble Paul added a comment -

        creating an array[map.size()] for every markAndSweep() is expensive. Iterator should be better

        Show
        Noble Paul added a comment - creating an array [map.size()] for every markAndSweep() is expensive. Iterator should be better
        Hide
        Yonik Seeley added a comment -

        Thanks Noble, we had a mid-air implementation collision
        I'm doing some quick performance testing the version I wrote now...I'll try it against your version after and then we can go from there.

        Show
        Yonik Seeley added a comment - Thanks Noble, we had a mid-air implementation collision I'm doing some quick performance testing the version I wrote now...I'll try it against your version after and then we can go from there.
        Hide
        Yonik Seeley added a comment -

        OK, here's the implementation based on my previous pseudo code, along with a very quick performance test.
        The test uses random keys over a slightly bigger range than the table.
        It also uses an upper water mark 10% higher than the lower water mark, and an acceptable water mark half way inbetween. I haven't experimented to see what the best acceptable water mark is for either impl. If anyone wants to do a more realistic test with real queries, they are welcome to it.

        I did 4 runs and took the lowest number for each sub-test. Java6 -server, WinXP, P4. Times in milliseconds.

            doPerfTest(2000000, 100000, 200000); // big cache
        noble=17063  yonik=9157
            doPerfTest(2000000, 100000, 120000);  // smaller key space increases distance between 
        oldest, newest and makes the first passes less effective.
        noble=8391  yonik=5812
            doPerfTest(6000000, 1000, 2000);    // small cache, smaller hit rate
        noble=17578  yonik=12515
            doPerfTest(6000000, 1000, 1200);    // small cache, bigger hit rate
        noble=11500  yonik=8219
        
        Show
        Yonik Seeley added a comment - OK, here's the implementation based on my previous pseudo code, along with a very quick performance test. The test uses random keys over a slightly bigger range than the table. It also uses an upper water mark 10% higher than the lower water mark, and an acceptable water mark half way inbetween. I haven't experimented to see what the best acceptable water mark is for either impl. If anyone wants to do a more realistic test with real queries, they are welcome to it. I did 4 runs and took the lowest number for each sub-test. Java6 -server, WinXP, P4. Times in milliseconds. doPerfTest(2000000, 100000, 200000); // big cache noble=17063 yonik=9157 doPerfTest(2000000, 100000, 120000); // smaller key space increases distance between oldest, newest and makes the first passes less effective. noble=8391 yonik=5812 doPerfTest(6000000, 1000, 2000); // small cache, smaller hit rate noble=17578 yonik=12515 doPerfTest(6000000, 1000, 1200); // small cache, bigger hit rate noble=11500 yonik=8219
        Hide
        Yonik Seeley added a comment -

        Nobe, your latest patch contains code like this:

            if (!markAndSweepLock.tryLock()) return;
            long oldestItem = this.oldestItem;
            [...]
            markAndSweepLock.unlock();
            this.oldestItem = oldestItem;
        

        oldestItem isn't volatile (and doesn't need to be if accessed correctly).
        The lock will also serve as a read barrier, so the first part is OK.
        The unlock will serve as a write barrier, but the set of oldestItem comes after it (not OK... another thread may not see the value).
        Changing the order (the write to oldestItem before the unlock) will ensure that the next thread that crosses a read barrier will see the new value.

        Show
        Yonik Seeley added a comment - Nobe, your latest patch contains code like this: if (!markAndSweepLock.tryLock()) return ; long oldestItem = this .oldestItem; [...] markAndSweepLock.unlock(); this .oldestItem = oldestItem; oldestItem isn't volatile (and doesn't need to be if accessed correctly). The lock will also serve as a read barrier, so the first part is OK. The unlock will serve as a write barrier, but the set of oldestItem comes after it (not OK... another thread may not see the value). Changing the order (the write to oldestItem before the unlock) will ensure that the next thread that crosses a read barrier will see the new value.
        Hide
        Yonik Seeley added a comment -

        Added some minor changes, making sure that minLimit >= 1 and limit >minLimit (needed for rounding with small cache sizes).
        Also added test code for LRUCache vs FastLRUCache.
        It appears that LRUCache is faster (at least on my single proc PC) when the hit ratio is low, and FastLRUCache is faster when the hit ratio is high.
        Should FastLRUCache be made the default in the example schema for the filterCache?

        time=2937 impl=LRUCache nThreads= 1 size=100000 maxKey=100000 gets=2000000 hitRatio=0.981608
        time=2266 impl=FastLRUCache nThreads= 1 size=100000 maxKey=100000 gets=2000000 hitRatio=0.981608
        time=3594 impl=LRUCache nThreads= 2 size=100000 maxKey=100000 gets=2000000 hitRatio=0.9816075
        time=1484 impl=FastLRUCache nThreads= 2 size=100000 maxKey=100000 gets=2000000 hitRatio=0.981608
        time=3203 impl=LRUCache nThreads= 1 size=100000 maxKey=120000 gets=2000000 hitRatio=0.835225
        time=4593 impl=FastLRUCache nThreads= 1 size=100000 maxKey=120000 gets=2000000 hitRatio=0.751506
        time=3781 impl=LRUCache nThreads= 2 size=100000 maxKey=120000 gets=2000000 hitRatio=0.834685
        time=2656 impl=FastLRUCache nThreads= 2 size=100000 maxKey=120000 gets=2000000 hitRatio=0.8232835000000001
        time=3234 impl=LRUCache nThreads= 1 size=100000 maxKey=200000 gets=2000000 hitRatio=0.523398
        time=5047 impl=FastLRUCache nThreads= 1 size=100000 maxKey=200000 gets=2000000 hitRatio=0.3831675
        time=4125 impl=LRUCache nThreads= 2 size=100000 maxKey=200000 gets=2000000 hitRatio=0.511871
        time=3969 impl=FastLRUCache nThreads= 2 size=100000 maxKey=200000 gets=2000000 hitRatio=0.6665975
        time=3390 impl=LRUCache nThreads= 1 size=100000 maxKey=1000000 gets=2000000 hitRatio=0.1445725
        time=5687 impl=FastLRUCache nThreads= 1 size=100000 maxKey=1000000 gets=2000000 hitRatio=0.10041049999999996
        time=4750 impl=LRUCache nThreads= 2 size=100000 maxKey=1000000 gets=2000000 hitRatio=0.10340150000000004
        time=6875 impl=FastLRUCache nThreads= 2 size=100000 maxKey=1000000 gets=2000000 hitRatio=0.22233749999999997
        time=1343 impl=LRUCache nThreads= 1 size=1000 maxKey=1000 gets=2000000 hitRatio=0.9998065
        time=860 impl=FastLRUCache nThreads= 1 size=1000 maxKey=1000 gets=2000000 hitRatio=0.9998065
        time=1547 impl=LRUCache nThreads= 2 size=1000 maxKey=1000 gets=2000000 hitRatio=0.9998065
        time=703 impl=FastLRUCache nThreads= 2 size=1000 maxKey=1000 gets=2000000 hitRatio=0.9998065
        time=1610 impl=LRUCache nThreads= 1 size=1000 maxKey=1200 gets=2000000 hitRatio=0.833648
        time=2406 impl=FastLRUCache nThreads= 1 size=1000 maxKey=1200 gets=2000000 hitRatio=0.7404839999999999
        time=2078 impl=LRUCache nThreads= 2 size=1000 maxKey=1200 gets=2000000 hitRatio=0.8334255
        time=859 impl=FastLRUCache nThreads= 2 size=1000 maxKey=1200 gets=2000000 hitRatio=0.998974
        time=1922 impl=LRUCache nThreads= 1 size=1000 maxKey=2000 gets=2000000 hitRatio=0.5003285
        time=2875 impl=FastLRUCache nThreads= 1 size=1000 maxKey=2000 gets=2000000 hitRatio=0.3516785
        time=2422 impl=LRUCache nThreads= 2 size=1000 maxKey=2000 gets=2000000 hitRatio=0.5002055000000001
        time=1203 impl=FastLRUCache nThreads= 2 size=1000 maxKey=2000 gets=2000000 hitRatio=0.821195
        time=2297 impl=LRUCache nThreads= 1 size=1000 maxKey=10000 gets=2000000 hitRatio=0.10054949999999996
        time=2969 impl=FastLRUCache nThreads= 1 size=1000 maxKey=10000 gets=2000000 hitRatio=0.05416350000000003
        time=3078 impl=LRUCache nThreads= 2 size=1000 maxKey=10000 gets=2000000 hitRatio=0.10003499999999999
        time=3000 impl=FastLRUCache nThreads= 2 size=1000 maxKey=10000 gets=2000000 hitRatio=0.10475299999999999
        
        Show
        Yonik Seeley added a comment - Added some minor changes, making sure that minLimit >= 1 and limit >minLimit (needed for rounding with small cache sizes). Also added test code for LRUCache vs FastLRUCache. It appears that LRUCache is faster (at least on my single proc PC) when the hit ratio is low, and FastLRUCache is faster when the hit ratio is high. Should FastLRUCache be made the default in the example schema for the filterCache? time=2937 impl=LRUCache nThreads= 1 size=100000 maxKey=100000 gets=2000000 hitRatio=0.981608 time=2266 impl=FastLRUCache nThreads= 1 size=100000 maxKey=100000 gets=2000000 hitRatio=0.981608 time=3594 impl=LRUCache nThreads= 2 size=100000 maxKey=100000 gets=2000000 hitRatio=0.9816075 time=1484 impl=FastLRUCache nThreads= 2 size=100000 maxKey=100000 gets=2000000 hitRatio=0.981608 time=3203 impl=LRUCache nThreads= 1 size=100000 maxKey=120000 gets=2000000 hitRatio=0.835225 time=4593 impl=FastLRUCache nThreads= 1 size=100000 maxKey=120000 gets=2000000 hitRatio=0.751506 time=3781 impl=LRUCache nThreads= 2 size=100000 maxKey=120000 gets=2000000 hitRatio=0.834685 time=2656 impl=FastLRUCache nThreads= 2 size=100000 maxKey=120000 gets=2000000 hitRatio=0.8232835000000001 time=3234 impl=LRUCache nThreads= 1 size=100000 maxKey=200000 gets=2000000 hitRatio=0.523398 time=5047 impl=FastLRUCache nThreads= 1 size=100000 maxKey=200000 gets=2000000 hitRatio=0.3831675 time=4125 impl=LRUCache nThreads= 2 size=100000 maxKey=200000 gets=2000000 hitRatio=0.511871 time=3969 impl=FastLRUCache nThreads= 2 size=100000 maxKey=200000 gets=2000000 hitRatio=0.6665975 time=3390 impl=LRUCache nThreads= 1 size=100000 maxKey=1000000 gets=2000000 hitRatio=0.1445725 time=5687 impl=FastLRUCache nThreads= 1 size=100000 maxKey=1000000 gets=2000000 hitRatio=0.10041049999999996 time=4750 impl=LRUCache nThreads= 2 size=100000 maxKey=1000000 gets=2000000 hitRatio=0.10340150000000004 time=6875 impl=FastLRUCache nThreads= 2 size=100000 maxKey=1000000 gets=2000000 hitRatio=0.22233749999999997 time=1343 impl=LRUCache nThreads= 1 size=1000 maxKey=1000 gets=2000000 hitRatio=0.9998065 time=860 impl=FastLRUCache nThreads= 1 size=1000 maxKey=1000 gets=2000000 hitRatio=0.9998065 time=1547 impl=LRUCache nThreads= 2 size=1000 maxKey=1000 gets=2000000 hitRatio=0.9998065 time=703 impl=FastLRUCache nThreads= 2 size=1000 maxKey=1000 gets=2000000 hitRatio=0.9998065 time=1610 impl=LRUCache nThreads= 1 size=1000 maxKey=1200 gets=2000000 hitRatio=0.833648 time=2406 impl=FastLRUCache nThreads= 1 size=1000 maxKey=1200 gets=2000000 hitRatio=0.7404839999999999 time=2078 impl=LRUCache nThreads= 2 size=1000 maxKey=1200 gets=2000000 hitRatio=0.8334255 time=859 impl=FastLRUCache nThreads= 2 size=1000 maxKey=1200 gets=2000000 hitRatio=0.998974 time=1922 impl=LRUCache nThreads= 1 size=1000 maxKey=2000 gets=2000000 hitRatio=0.5003285 time=2875 impl=FastLRUCache nThreads= 1 size=1000 maxKey=2000 gets=2000000 hitRatio=0.3516785 time=2422 impl=LRUCache nThreads= 2 size=1000 maxKey=2000 gets=2000000 hitRatio=0.5002055000000001 time=1203 impl=FastLRUCache nThreads= 2 size=1000 maxKey=2000 gets=2000000 hitRatio=0.821195 time=2297 impl=LRUCache nThreads= 1 size=1000 maxKey=10000 gets=2000000 hitRatio=0.10054949999999996 time=2969 impl=FastLRUCache nThreads= 1 size=1000 maxKey=10000 gets=2000000 hitRatio=0.05416350000000003 time=3078 impl=LRUCache nThreads= 2 size=1000 maxKey=10000 gets=2000000 hitRatio=0.10003499999999999 time=3000 impl=FastLRUCache nThreads= 2 size=1000 maxKey=10000 gets=2000000 hitRatio=0.10475299999999999
        Hide
        Yonik Seeley added a comment -

        Latest patch:

        • fixes an off-by-one (it was possible to go below minSize)
        • fixes tests to not expect a specific number of evictions.
        • makes FastLRUCache the default for filterCache in the example schema and main test schema.

        I'll commit soon if there are no objections.

        Show
        Yonik Seeley added a comment - Latest patch: fixes an off-by-one (it was possible to go below minSize) fixes tests to not expect a specific number of evictions. makes FastLRUCache the default for filterCache in the example schema and main test schema. I'll commit soon if there are no objections.
        Hide
        Noble Paul added a comment -
        CacheEntry[] eset = new CacheEntry[sz];
        int eSize = 0;
        

        Isn't it too expensive to create a potentially huge array every time we do a clean? (too much work for GC) .May be we do not even need it if the first loop is enough. Moreover this one thing has added more code.

        I didn't use lucene PriorityQueue because if somebody wished to lift the code they can easily do so if there is no other dependency. When I posted the requirement in google collections there was a lot of interest in such a component. Can TreeSet do the trick?

        Show
        Noble Paul added a comment - CacheEntry[] eset = new CacheEntry[sz]; int eSize = 0; Isn't it too expensive to create a potentially huge array every time we do a clean? (too much work for GC) .May be we do not even need it if the first loop is enough. Moreover this one thing has added more code. I didn't use lucene PriorityQueue because if somebody wished to lift the code they can easily do so if there is no other dependency. When I posted the requirement in google collections there was a lot of interest in such a component. Can TreeSet do the trick?
        Hide
        Yonik Seeley added a comment - - edited

        Isn't it too expensive to create a potentially huge array every time we do a clean? (too much work for GC)

        That's what benchmarking is for

        It's a single short-lived allocation that allows us to greatly reduce the number of elements we need to evaluate on successive passes. Inserts into a TreeSet may have a higher GC cost given it's an allocation per insert.

        May be we do not even need it if the first loop is enough.

        Right... although in my testing, it seemed like the first loop was rarely sufficient (although the second often was).

        Show
        Yonik Seeley added a comment - - edited Isn't it too expensive to create a potentially huge array every time we do a clean? (too much work for GC) That's what benchmarking is for It's a single short-lived allocation that allows us to greatly reduce the number of elements we need to evaluate on successive passes. Inserts into a TreeSet may have a higher GC cost given it's an allocation per insert. May be we do not even need it if the first loop is enough. Right... although in my testing, it seemed like the first loop was rarely sufficient (although the second often was).
        Hide
        Shalin Shekhar Mangar added a comment -

        Here's the performance test from the patch on a more recent machine – Intel Quad Core, RHEL 64-bit, Java HotSpot(TM) 64-Bit Server VM (build 1.5.0_11-b03, mixed mode):

        time=1456 impl=LRUCache nThreads= 1 size=100000 maxKey=100000 gets=2000000 hitRatio=0.981608
        time=1041 impl=FastLRUCache nThreads= 1 size=100000 maxKey=100000 gets=2000000 hitRatio=0.981608
        time=3256 impl=LRUCache nThreads= 2 size=100000 maxKey=100000 gets=2000000 hitRatio=0.981608
        time=754 impl=FastLRUCache nThreads= 2 size=100000 maxKey=100000 gets=2000000 hitRatio=0.981608
        time=1234 impl=LRUCache nThreads= 1 size=100000 maxKey=120000 gets=2000000 hitRatio=0.835225
        time=1564 impl=FastLRUCache nThreads= 1 size=100000 maxKey=120000 gets=2000000 hitRatio=0.751506
        time=3728 impl=LRUCache nThreads= 2 size=100000 maxKey=120000 gets=2000000 hitRatio=0.835006
        time=1384 impl=FastLRUCache nThreads= 2 size=100000 maxKey=120000 gets=2000000 hitRatio=0.798109
        time=1357 impl=LRUCache nThreads= 1 size=100000 maxKey=200000 gets=2000000 hitRatio=0.523398
        time=1894 impl=FastLRUCache nThreads= 1 size=100000 maxKey=200000 gets=2000000 hitRatio=0.3831675
        time=4556 impl=LRUCache nThreads= 2 size=100000 maxKey=200000 gets=2000000 hitRatio=0.512785
        time=1514 impl=FastLRUCache nThreads= 2 size=100000 maxKey=200000 gets=2000000 hitRatio=0.4682115
        time=1614 impl=LRUCache nThreads= 1 size=100000 maxKey=1000000 gets=2000000 hitRatio=0.1445725
        time=1837 impl=FastLRUCache nThreads= 1 size=100000 maxKey=1000000 gets=2000000 hitRatio=0.10041049999999996
        time=4710 impl=LRUCache nThreads= 2 size=100000 maxKey=1000000 gets=2000000 hitRatio=0.10963999999999996
        time=1816 impl=FastLRUCache nThreads= 2 size=100000 maxKey=1000000 gets=2000000 hitRatio=0.11144399999999999
        time=339 impl=LRUCache nThreads= 1 size=1000 maxKey=1000 gets=2000000 hitRatio=0.9998065
        time=292 impl=FastLRUCache nThreads= 1 size=1000 maxKey=1000 gets=2000000 hitRatio=0.9998065
        time=2511 impl=LRUCache nThreads= 2 size=1000 maxKey=1000 gets=2000000 hitRatio=0.9998065
        time=351 impl=FastLRUCache nThreads= 2 size=1000 maxKey=1000 gets=2000000 hitRatio=0.9998065
        time=383 impl=LRUCache nThreads= 1 size=1000 maxKey=1200 gets=2000000 hitRatio=0.833648
        time=580 impl=FastLRUCache nThreads= 1 size=1000 maxKey=1200 gets=2000000 hitRatio=0.7404839999999999
        time=2716 impl=LRUCache nThreads= 2 size=1000 maxKey=1200 gets=2000000 hitRatio=0.8337875
        time=805 impl=FastLRUCache nThreads= 2 size=1000 maxKey=1200 gets=2000000 hitRatio=0.79799
        time=570 impl=LRUCache nThreads= 1 size=1000 maxKey=2000 gets=2000000 hitRatio=0.5003285
        time=794 impl=FastLRUCache nThreads= 1 size=1000 maxKey=2000 gets=2000000 hitRatio=0.3516785
        time=3676 impl=LRUCache nThreads= 2 size=1000 maxKey=2000 gets=2000000 hitRatio=0.49959549999999997
        time=1685 impl=FastLRUCache nThreads= 2 size=1000 maxKey=2000 gets=2000000 hitRatio=0.436728
        time=712 impl=LRUCache nThreads= 1 size=1000 maxKey=10000 gets=2000000 hitRatio=0.10054949999999996
        time=1022 impl=FastLRUCache nThreads= 1 size=1000 maxKey=10000 gets=2000000 hitRatio=0.05416350000000003
        time=4395 impl=LRUCache nThreads= 2 size=1000 maxKey=10000 gets=2000000 hitRatio=0.100526
        time=2562 impl=FastLRUCache nThreads= 2 size=1000 maxKey=10000 gets=2000000 hitRatio=0.08556600000000003
        

        With more number of threads this time (4 and 16):

        time=1794 impl=LRUCache nThreads= 4 size=100000 maxKey=100000 gets=2000000 hitRatio=0.981608
        time=594 impl=FastLRUCache nThreads= 4 size=100000 maxKey=100000 gets=2000000 hitRatio=0.9816075
        time=1737 impl=LRUCache nThreads= 16 size=100000 maxKey=100000 gets=2000000 hitRatio=0.981607
        time=602 impl=FastLRUCache nThreads= 16 size=100000 maxKey=100000 gets=2000000 hitRatio=0.981602
        time=2387 impl=LRUCache nThreads= 4 size=100000 maxKey=120000 gets=2000000 hitRatio=0.830956
        time=866 impl=FastLRUCache nThreads= 4 size=100000 maxKey=120000 gets=2000000 hitRatio=0.8892465
        time=1793 impl=LRUCache nThreads= 16 size=100000 maxKey=120000 gets=2000000 hitRatio=0.8274485
        time=706 impl=FastLRUCache nThreads= 16 size=100000 maxKey=120000 gets=2000000 hitRatio=0.9586865
        time=2233 impl=LRUCache nThreads= 4 size=100000 maxKey=200000 gets=2000000 hitRatio=0.5025255
        time=1228 impl=FastLRUCache nThreads= 4 size=100000 maxKey=200000 gets=2000000 hitRatio=0.654153
        time=1905 impl=LRUCache nThreads= 16 size=100000 maxKey=200000 gets=2000000 hitRatio=0.500583
        time=883 impl=FastLRUCache nThreads= 16 size=100000 maxKey=200000 gets=2000000 hitRatio=0.9067965
        time=5336 impl=LRUCache nThreads= 4 size=100000 maxKey=1000000 gets=2000000 hitRatio=0.10182199999999997
        time=1780 impl=FastLRUCache nThreads= 4 size=100000 maxKey=1000000 gets=2000000 hitRatio=0.25870800000000005
        time=2911 impl=LRUCache nThreads= 16 size=100000 maxKey=1000000 gets=2000000 hitRatio=0.10132300000000005
        time=1941 impl=FastLRUCache nThreads= 16 size=100000 maxKey=1000000 gets=2000000 hitRatio=0.508488
        time=687 impl=LRUCache nThreads= 4 size=1000 maxKey=1000 gets=2000000 hitRatio=0.9998065
        time=421 impl=FastLRUCache nThreads= 4 size=1000 maxKey=1000 gets=2000000 hitRatio=0.9998065
        time=782 impl=LRUCache nThreads= 16 size=1000 maxKey=1000 gets=2000000 hitRatio=0.9998065
        time=452 impl=FastLRUCache nThreads= 16 size=1000 maxKey=1000 gets=2000000 hitRatio=0.9998065
        time=813 impl=LRUCache nThreads= 4 size=1000 maxKey=1200 gets=2000000 hitRatio=0.8333735
        time=678 impl=FastLRUCache nThreads= 4 size=1000 maxKey=1200 gets=2000000 hitRatio=0.9988885
        time=794 impl=LRUCache nThreads= 16 size=1000 maxKey=1200 gets=2000000 hitRatio=0.8331635
        time=503 impl=FastLRUCache nThreads= 16 size=1000 maxKey=1200 gets=2000000 hitRatio=0.977526
        time=1554 impl=LRUCache nThreads= 4 size=1000 maxKey=2000 gets=2000000 hitRatio=0.500093
        time=928 impl=FastLRUCache nThreads= 4 size=1000 maxKey=2000 gets=2000000 hitRatio=0.802332
        time=1102 impl=LRUCache nThreads= 16 size=1000 maxKey=2000 gets=2000000 hitRatio=0.5002759999999999
        time=566 impl=FastLRUCache nThreads= 16 size=1000 maxKey=2000 gets=2000000 hitRatio=0.954131
        time=1543 impl=LRUCache nThreads= 4 size=1000 maxKey=10000 gets=2000000 hitRatio=0.10062899999999997
        time=1039 impl=FastLRUCache nThreads= 4 size=1000 maxKey=10000 gets=2000000 hitRatio=0.7582409999999999
        time=1372 impl=LRUCache nThreads= 16 size=1000 maxKey=10000 gets=2000000 hitRatio=0.10031000000000001
        time=604 impl=FastLRUCache nThreads= 16 size=1000 maxKey=10000 gets=2000000 hitRatio=0.935282
        

        Now with 8 and 32 threads:

        time=2109 impl=LRUCache nThreads= 8 size=100000 maxKey=100000 gets=2000000 hitRatio=0.9816075
        time=608 impl=FastLRUCache nThreads= 8 size=100000 maxKey=100000 gets=2000000 hitRatio=0.981606
        time=1502 impl=LRUCache nThreads= 32 size=100000 maxKey=100000 gets=2000000 hitRatio=0.9816045
        time=648 impl=FastLRUCache nThreads= 32 size=100000 maxKey=100000 gets=2000000 hitRatio=0.981592
        time=3876 impl=LRUCache nThreads= 8 size=100000 maxKey=120000 gets=2000000 hitRatio=0.8267995
        time=748 impl=FastLRUCache nThreads= 8 size=100000 maxKey=120000 gets=2000000 hitRatio=0.915961
        time=2176 impl=LRUCache nThreads= 32 size=100000 maxKey=120000 gets=2000000 hitRatio=0.8271935
        time=694 impl=FastLRUCache nThreads= 32 size=100000 maxKey=120000 gets=2000000 hitRatio=0.9652565
        time=2038 impl=LRUCache nThreads= 8 size=100000 maxKey=200000 gets=2000000 hitRatio=0.5005305
        time=1088 impl=FastLRUCache nThreads= 8 size=100000 maxKey=200000 gets=2000000 hitRatio=0.789179
        time=2147 impl=LRUCache nThreads= 32 size=100000 maxKey=200000 gets=2000000 hitRatio=0.4997505
        time=884 impl=FastLRUCache nThreads= 32 size=100000 maxKey=200000 gets=2000000 hitRatio=0.926915
        time=2343 impl=LRUCache nThreads= 8 size=100000 maxKey=1000000 gets=2000000 hitRatio=0.10397699999999999
        time=2207 impl=FastLRUCache nThreads= 8 size=100000 maxKey=1000000 gets=2000000 hitRatio=0.34063
        time=3440 impl=LRUCache nThreads= 32 size=100000 maxKey=1000000 gets=2000000 hitRatio=0.10123850000000001
        time=2087 impl=FastLRUCache nThreads= 32 size=100000 maxKey=1000000 gets=2000000 hitRatio=0.5367375
        time=909 impl=LRUCache nThreads= 8 size=1000 maxKey=1000 gets=2000000 hitRatio=0.9998065
        time=443 impl=FastLRUCache nThreads= 8 size=1000 maxKey=1000 gets=2000000 hitRatio=0.9998065
        time=682 impl=LRUCache nThreads= 32 size=1000 maxKey=1000 gets=2000000 hitRatio=0.9998065
        time=447 impl=FastLRUCache nThreads= 32 size=1000 maxKey=1000 gets=2000000 hitRatio=0.9998065
        time=1189 impl=LRUCache nThreads= 8 size=1000 maxKey=1200 gets=2000000 hitRatio=0.832726
        time=605 impl=FastLRUCache nThreads= 8 size=1000 maxKey=1200 gets=2000000 hitRatio=0.919104
        time=1463 impl=LRUCache nThreads= 32 size=1000 maxKey=1200 gets=2000000 hitRatio=0.8337005
        time=489 impl=FastLRUCache nThreads= 32 size=1000 maxKey=1200 gets=2000000 hitRatio=0.9845845
        time=1256 impl=LRUCache nThreads= 8 size=1000 maxKey=2000 gets=2000000 hitRatio=0.500149
        time=678 impl=FastLRUCache nThreads= 8 size=1000 maxKey=2000 gets=2000000 hitRatio=0.907774
        time=1013 impl=LRUCache nThreads= 32 size=1000 maxKey=2000 gets=2000000 hitRatio=0.49962399999999996
        time=503 impl=FastLRUCache nThreads= 32 size=1000 maxKey=2000 gets=2000000 hitRatio=0.976796
        time=1504 impl=LRUCache nThreads= 8 size=1000 maxKey=10000 gets=2000000 hitRatio=0.10030550000000005
        time=754 impl=FastLRUCache nThreads= 8 size=1000 maxKey=10000 gets=2000000 hitRatio=0.9151345
        time=1245 impl=LRUCache nThreads= 32 size=1000 maxKey=10000 gets=2000000 hitRatio=0.10028899999999996
        time=499 impl=FastLRUCache nThreads= 32 size=1000 maxKey=10000 gets=2000000 hitRatio=0.978823
        

        When the number of threads are increased, FastLRUCache is true to its name

        Show
        Shalin Shekhar Mangar added a comment - Here's the performance test from the patch on a more recent machine – Intel Quad Core, RHEL 64-bit, Java HotSpot(TM) 64-Bit Server VM (build 1.5.0_11-b03, mixed mode): time=1456 impl=LRUCache nThreads= 1 size=100000 maxKey=100000 gets=2000000 hitRatio=0.981608 time=1041 impl=FastLRUCache nThreads= 1 size=100000 maxKey=100000 gets=2000000 hitRatio=0.981608 time=3256 impl=LRUCache nThreads= 2 size=100000 maxKey=100000 gets=2000000 hitRatio=0.981608 time=754 impl=FastLRUCache nThreads= 2 size=100000 maxKey=100000 gets=2000000 hitRatio=0.981608 time=1234 impl=LRUCache nThreads= 1 size=100000 maxKey=120000 gets=2000000 hitRatio=0.835225 time=1564 impl=FastLRUCache nThreads= 1 size=100000 maxKey=120000 gets=2000000 hitRatio=0.751506 time=3728 impl=LRUCache nThreads= 2 size=100000 maxKey=120000 gets=2000000 hitRatio=0.835006 time=1384 impl=FastLRUCache nThreads= 2 size=100000 maxKey=120000 gets=2000000 hitRatio=0.798109 time=1357 impl=LRUCache nThreads= 1 size=100000 maxKey=200000 gets=2000000 hitRatio=0.523398 time=1894 impl=FastLRUCache nThreads= 1 size=100000 maxKey=200000 gets=2000000 hitRatio=0.3831675 time=4556 impl=LRUCache nThreads= 2 size=100000 maxKey=200000 gets=2000000 hitRatio=0.512785 time=1514 impl=FastLRUCache nThreads= 2 size=100000 maxKey=200000 gets=2000000 hitRatio=0.4682115 time=1614 impl=LRUCache nThreads= 1 size=100000 maxKey=1000000 gets=2000000 hitRatio=0.1445725 time=1837 impl=FastLRUCache nThreads= 1 size=100000 maxKey=1000000 gets=2000000 hitRatio=0.10041049999999996 time=4710 impl=LRUCache nThreads= 2 size=100000 maxKey=1000000 gets=2000000 hitRatio=0.10963999999999996 time=1816 impl=FastLRUCache nThreads= 2 size=100000 maxKey=1000000 gets=2000000 hitRatio=0.11144399999999999 time=339 impl=LRUCache nThreads= 1 size=1000 maxKey=1000 gets=2000000 hitRatio=0.9998065 time=292 impl=FastLRUCache nThreads= 1 size=1000 maxKey=1000 gets=2000000 hitRatio=0.9998065 time=2511 impl=LRUCache nThreads= 2 size=1000 maxKey=1000 gets=2000000 hitRatio=0.9998065 time=351 impl=FastLRUCache nThreads= 2 size=1000 maxKey=1000 gets=2000000 hitRatio=0.9998065 time=383 impl=LRUCache nThreads= 1 size=1000 maxKey=1200 gets=2000000 hitRatio=0.833648 time=580 impl=FastLRUCache nThreads= 1 size=1000 maxKey=1200 gets=2000000 hitRatio=0.7404839999999999 time=2716 impl=LRUCache nThreads= 2 size=1000 maxKey=1200 gets=2000000 hitRatio=0.8337875 time=805 impl=FastLRUCache nThreads= 2 size=1000 maxKey=1200 gets=2000000 hitRatio=0.79799 time=570 impl=LRUCache nThreads= 1 size=1000 maxKey=2000 gets=2000000 hitRatio=0.5003285 time=794 impl=FastLRUCache nThreads= 1 size=1000 maxKey=2000 gets=2000000 hitRatio=0.3516785 time=3676 impl=LRUCache nThreads= 2 size=1000 maxKey=2000 gets=2000000 hitRatio=0.49959549999999997 time=1685 impl=FastLRUCache nThreads= 2 size=1000 maxKey=2000 gets=2000000 hitRatio=0.436728 time=712 impl=LRUCache nThreads= 1 size=1000 maxKey=10000 gets=2000000 hitRatio=0.10054949999999996 time=1022 impl=FastLRUCache nThreads= 1 size=1000 maxKey=10000 gets=2000000 hitRatio=0.05416350000000003 time=4395 impl=LRUCache nThreads= 2 size=1000 maxKey=10000 gets=2000000 hitRatio=0.100526 time=2562 impl=FastLRUCache nThreads= 2 size=1000 maxKey=10000 gets=2000000 hitRatio=0.08556600000000003 With more number of threads this time (4 and 16): time=1794 impl=LRUCache nThreads= 4 size=100000 maxKey=100000 gets=2000000 hitRatio=0.981608 time=594 impl=FastLRUCache nThreads= 4 size=100000 maxKey=100000 gets=2000000 hitRatio=0.9816075 time=1737 impl=LRUCache nThreads= 16 size=100000 maxKey=100000 gets=2000000 hitRatio=0.981607 time=602 impl=FastLRUCache nThreads= 16 size=100000 maxKey=100000 gets=2000000 hitRatio=0.981602 time=2387 impl=LRUCache nThreads= 4 size=100000 maxKey=120000 gets=2000000 hitRatio=0.830956 time=866 impl=FastLRUCache nThreads= 4 size=100000 maxKey=120000 gets=2000000 hitRatio=0.8892465 time=1793 impl=LRUCache nThreads= 16 size=100000 maxKey=120000 gets=2000000 hitRatio=0.8274485 time=706 impl=FastLRUCache nThreads= 16 size=100000 maxKey=120000 gets=2000000 hitRatio=0.9586865 time=2233 impl=LRUCache nThreads= 4 size=100000 maxKey=200000 gets=2000000 hitRatio=0.5025255 time=1228 impl=FastLRUCache nThreads= 4 size=100000 maxKey=200000 gets=2000000 hitRatio=0.654153 time=1905 impl=LRUCache nThreads= 16 size=100000 maxKey=200000 gets=2000000 hitRatio=0.500583 time=883 impl=FastLRUCache nThreads= 16 size=100000 maxKey=200000 gets=2000000 hitRatio=0.9067965 time=5336 impl=LRUCache nThreads= 4 size=100000 maxKey=1000000 gets=2000000 hitRatio=0.10182199999999997 time=1780 impl=FastLRUCache nThreads= 4 size=100000 maxKey=1000000 gets=2000000 hitRatio=0.25870800000000005 time=2911 impl=LRUCache nThreads= 16 size=100000 maxKey=1000000 gets=2000000 hitRatio=0.10132300000000005 time=1941 impl=FastLRUCache nThreads= 16 size=100000 maxKey=1000000 gets=2000000 hitRatio=0.508488 time=687 impl=LRUCache nThreads= 4 size=1000 maxKey=1000 gets=2000000 hitRatio=0.9998065 time=421 impl=FastLRUCache nThreads= 4 size=1000 maxKey=1000 gets=2000000 hitRatio=0.9998065 time=782 impl=LRUCache nThreads= 16 size=1000 maxKey=1000 gets=2000000 hitRatio=0.9998065 time=452 impl=FastLRUCache nThreads= 16 size=1000 maxKey=1000 gets=2000000 hitRatio=0.9998065 time=813 impl=LRUCache nThreads= 4 size=1000 maxKey=1200 gets=2000000 hitRatio=0.8333735 time=678 impl=FastLRUCache nThreads= 4 size=1000 maxKey=1200 gets=2000000 hitRatio=0.9988885 time=794 impl=LRUCache nThreads= 16 size=1000 maxKey=1200 gets=2000000 hitRatio=0.8331635 time=503 impl=FastLRUCache nThreads= 16 size=1000 maxKey=1200 gets=2000000 hitRatio=0.977526 time=1554 impl=LRUCache nThreads= 4 size=1000 maxKey=2000 gets=2000000 hitRatio=0.500093 time=928 impl=FastLRUCache nThreads= 4 size=1000 maxKey=2000 gets=2000000 hitRatio=0.802332 time=1102 impl=LRUCache nThreads= 16 size=1000 maxKey=2000 gets=2000000 hitRatio=0.5002759999999999 time=566 impl=FastLRUCache nThreads= 16 size=1000 maxKey=2000 gets=2000000 hitRatio=0.954131 time=1543 impl=LRUCache nThreads= 4 size=1000 maxKey=10000 gets=2000000 hitRatio=0.10062899999999997 time=1039 impl=FastLRUCache nThreads= 4 size=1000 maxKey=10000 gets=2000000 hitRatio=0.7582409999999999 time=1372 impl=LRUCache nThreads= 16 size=1000 maxKey=10000 gets=2000000 hitRatio=0.10031000000000001 time=604 impl=FastLRUCache nThreads= 16 size=1000 maxKey=10000 gets=2000000 hitRatio=0.935282 Now with 8 and 32 threads: time=2109 impl=LRUCache nThreads= 8 size=100000 maxKey=100000 gets=2000000 hitRatio=0.9816075 time=608 impl=FastLRUCache nThreads= 8 size=100000 maxKey=100000 gets=2000000 hitRatio=0.981606 time=1502 impl=LRUCache nThreads= 32 size=100000 maxKey=100000 gets=2000000 hitRatio=0.9816045 time=648 impl=FastLRUCache nThreads= 32 size=100000 maxKey=100000 gets=2000000 hitRatio=0.981592 time=3876 impl=LRUCache nThreads= 8 size=100000 maxKey=120000 gets=2000000 hitRatio=0.8267995 time=748 impl=FastLRUCache nThreads= 8 size=100000 maxKey=120000 gets=2000000 hitRatio=0.915961 time=2176 impl=LRUCache nThreads= 32 size=100000 maxKey=120000 gets=2000000 hitRatio=0.8271935 time=694 impl=FastLRUCache nThreads= 32 size=100000 maxKey=120000 gets=2000000 hitRatio=0.9652565 time=2038 impl=LRUCache nThreads= 8 size=100000 maxKey=200000 gets=2000000 hitRatio=0.5005305 time=1088 impl=FastLRUCache nThreads= 8 size=100000 maxKey=200000 gets=2000000 hitRatio=0.789179 time=2147 impl=LRUCache nThreads= 32 size=100000 maxKey=200000 gets=2000000 hitRatio=0.4997505 time=884 impl=FastLRUCache nThreads= 32 size=100000 maxKey=200000 gets=2000000 hitRatio=0.926915 time=2343 impl=LRUCache nThreads= 8 size=100000 maxKey=1000000 gets=2000000 hitRatio=0.10397699999999999 time=2207 impl=FastLRUCache nThreads= 8 size=100000 maxKey=1000000 gets=2000000 hitRatio=0.34063 time=3440 impl=LRUCache nThreads= 32 size=100000 maxKey=1000000 gets=2000000 hitRatio=0.10123850000000001 time=2087 impl=FastLRUCache nThreads= 32 size=100000 maxKey=1000000 gets=2000000 hitRatio=0.5367375 time=909 impl=LRUCache nThreads= 8 size=1000 maxKey=1000 gets=2000000 hitRatio=0.9998065 time=443 impl=FastLRUCache nThreads= 8 size=1000 maxKey=1000 gets=2000000 hitRatio=0.9998065 time=682 impl=LRUCache nThreads= 32 size=1000 maxKey=1000 gets=2000000 hitRatio=0.9998065 time=447 impl=FastLRUCache nThreads= 32 size=1000 maxKey=1000 gets=2000000 hitRatio=0.9998065 time=1189 impl=LRUCache nThreads= 8 size=1000 maxKey=1200 gets=2000000 hitRatio=0.832726 time=605 impl=FastLRUCache nThreads= 8 size=1000 maxKey=1200 gets=2000000 hitRatio=0.919104 time=1463 impl=LRUCache nThreads= 32 size=1000 maxKey=1200 gets=2000000 hitRatio=0.8337005 time=489 impl=FastLRUCache nThreads= 32 size=1000 maxKey=1200 gets=2000000 hitRatio=0.9845845 time=1256 impl=LRUCache nThreads= 8 size=1000 maxKey=2000 gets=2000000 hitRatio=0.500149 time=678 impl=FastLRUCache nThreads= 8 size=1000 maxKey=2000 gets=2000000 hitRatio=0.907774 time=1013 impl=LRUCache nThreads= 32 size=1000 maxKey=2000 gets=2000000 hitRatio=0.49962399999999996 time=503 impl=FastLRUCache nThreads= 32 size=1000 maxKey=2000 gets=2000000 hitRatio=0.976796 time=1504 impl=LRUCache nThreads= 8 size=1000 maxKey=10000 gets=2000000 hitRatio=0.10030550000000005 time=754 impl=FastLRUCache nThreads= 8 size=1000 maxKey=10000 gets=2000000 hitRatio=0.9151345 time=1245 impl=LRUCache nThreads= 32 size=1000 maxKey=10000 gets=2000000 hitRatio=0.10028899999999996 time=499 impl=FastLRUCache nThreads= 32 size=1000 maxKey=10000 gets=2000000 hitRatio=0.978823 When the number of threads are increased, FastLRUCache is true to its name
        Hide
        Yonik Seeley added a comment -

        That benchmark really isn't valid for a high number of threads though: notice the difference in hitRatio.
        If you have many threads quickly adding items and only one thread at a time removing items, the FastLRUCache goes over it's target size and thus increases it's hitRatio, making it artificially faster.

        This isn't a concern for it's use in Solr though, since the generation of a cache value will be much slower than clearing the cache.

        Show
        Yonik Seeley added a comment - That benchmark really isn't valid for a high number of threads though: notice the difference in hitRatio. If you have many threads quickly adding items and only one thread at a time removing items, the FastLRUCache goes over it's target size and thus increases it's hitRatio, making it artificially faster. This isn't a concern for it's use in Solr though, since the generation of a cache value will be much slower than clearing the cache.
        Hide
        Todd Feak added a comment -

        Is there an easy way to get this patched into 1.3.0?

        Right now, I think I have to grab 7 patches and apply them in order. Will that give me the correct content? Is there an easier way to do this from the repository?

        Show
        Todd Feak added a comment - Is there an easy way to get this patched into 1.3.0? Right now, I think I have to grab 7 patches and apply them in order. Will that give me the correct content? Is there an easier way to do this from the repository?
        Hide
        Noble Paul added a comment -

        There is another number which we have ignored. If the cleanup is done in a separate thread , FastLRUCache consistently outperforms the legacy one. (shalin forgot to put the numbers). For a very large cache size , the cleanup takes ~200-300 ms. Which means a request can end up paying a huge price.

        We need to add a new 'newCleanupThread' option to FastLRUCache (it is there in my old patch). I guess with that we can make FastLRUcache the default with newCleanupThread=true.

        Is there an easy way to get this patched into 1.3.0?

        If you apply yonik's latest patch on trunk you get two extra files . You can straightaway copy those two files to 1.3 and use it.

        Show
        Noble Paul added a comment - There is another number which we have ignored. If the cleanup is done in a separate thread , FastLRUCache consistently outperforms the legacy one. (shalin forgot to put the numbers). For a very large cache size , the cleanup takes ~200-300 ms. Which means a request can end up paying a huge price. We need to add a new 'newCleanupThread' option to FastLRUCache (it is there in my old patch). I guess with that we can make FastLRUcache the default with newCleanupThread=true. Is there an easy way to get this patched into 1.3.0? If you apply yonik's latest patch on trunk you get two extra files . You can straightaway copy those two files to 1.3 and use it.
        Hide
        Shalin Shekhar Mangar added a comment -

        I ran it again with a new thread for each cleanup. Time taken for markAndSweep is printed:

        time=1550 impl=LRUCache nThreads= 1 size=100000 maxKey=120000 gets=2000000 hitRatio=0.835225
        MarkAndSweepTime = 138
        MarkAndSweepTime = 35
        MarkAndSweepTime = 36
        MarkAndSweepTime = 39
        MarkAndSweepTime = 41
        MarkAndSweepTime = 42
        MarkAndSweepTime = 42
        MarkAndSweepTime = 44
        MarkAndSweepTime = 43
        MarkAndSweepTime = 43
        MarkAndSweepTime = 44
        MarkAndSweepTime = 43
        MarkAndSweepTime = 43
        MarkAndSweepTime = 44
        MarkAndSweepTime = 43
        MarkAndSweepTime = 44
        MarkAndSweepTime = 44
        MarkAndSweepTime = 43
        time=1378 impl=FastLRUCache nThreads= 1 size=100000 maxKey=120000 gets=2000000 hitRatio=0.8130459999999999
        
        time=3942 impl=LRUCache nThreads= 2 size=100000 maxKey=120000 gets=2000000 hitRatio=0.835045
        MarkAndSweepTime = 58
        MarkAndSweepTime = 165
        MarkAndSweepTime = 32
        MarkAndSweepTime = 34
        MarkAndSweepTime = 37
        MarkAndSweepTime = 37
        MarkAndSweepTime = 46
        MarkAndSweepTime = 40
        MarkAndSweepTime = 61
        MarkAndSweepTime = 53
        MarkAndSweepTime = 51
        MarkAndSweepTime = 44
        MarkAndSweepTime = 47
        MarkAndSweepTime = 47
        MarkAndSweepTime = 48
        MarkAndSweepTime = 48
        MarkAndSweepTime = 46
        time=1062 impl=FastLRUCache nThreads= 2 size=100000 maxKey=120000 gets=2000000 hitRatio=0.8560415
        
        Show
        Shalin Shekhar Mangar added a comment - I ran it again with a new thread for each cleanup. Time taken for markAndSweep is printed: time=1550 impl=LRUCache nThreads= 1 size=100000 maxKey=120000 gets=2000000 hitRatio=0.835225 MarkAndSweepTime = 138 MarkAndSweepTime = 35 MarkAndSweepTime = 36 MarkAndSweepTime = 39 MarkAndSweepTime = 41 MarkAndSweepTime = 42 MarkAndSweepTime = 42 MarkAndSweepTime = 44 MarkAndSweepTime = 43 MarkAndSweepTime = 43 MarkAndSweepTime = 44 MarkAndSweepTime = 43 MarkAndSweepTime = 43 MarkAndSweepTime = 44 MarkAndSweepTime = 43 MarkAndSweepTime = 44 MarkAndSweepTime = 44 MarkAndSweepTime = 43 time=1378 impl=FastLRUCache nThreads= 1 size=100000 maxKey=120000 gets=2000000 hitRatio=0.8130459999999999 time=3942 impl=LRUCache nThreads= 2 size=100000 maxKey=120000 gets=2000000 hitRatio=0.835045 MarkAndSweepTime = 58 MarkAndSweepTime = 165 MarkAndSweepTime = 32 MarkAndSweepTime = 34 MarkAndSweepTime = 37 MarkAndSweepTime = 37 MarkAndSweepTime = 46 MarkAndSweepTime = 40 MarkAndSweepTime = 61 MarkAndSweepTime = 53 MarkAndSweepTime = 51 MarkAndSweepTime = 44 MarkAndSweepTime = 47 MarkAndSweepTime = 47 MarkAndSweepTime = 48 MarkAndSweepTime = 48 MarkAndSweepTime = 46 time=1062 impl=FastLRUCache nThreads= 2 size=100000 maxKey=120000 gets=2000000 hitRatio=0.8560415
        Hide
        Yonik Seeley added a comment -

        I just committed a fix to the setting of acceptableSize - it was always being set to maxSize, which would normally cause the cleaning routine to return after the first phase (and thus be called more often than normal).

        Show
        Yonik Seeley added a comment - I just committed a fix to the setting of acceptableSize - it was always being set to maxSize, which would normally cause the cleaning routine to return after the first phase (and thus be called more often than normal).
        Hide
        Noble Paul added a comment -

        added a new boolean attribute newCleanThread . Default is set to false

        Show
        Noble Paul added a comment - added a new boolean attribute newCleanThread . Default is set to false
        Hide
        Shalin Shekhar Mangar added a comment -

        Yonik, what do you think about using a new cleanup thread?

        Show
        Shalin Shekhar Mangar added a comment - Yonik, what do you think about using a new cleanup thread?
        Hide
        Yonik Seeley added a comment -

        The ability to use a separate cleanup thread is interesting... but I'm not sure that having the ability to spin off a new thread for each cleanup is something one would ever want to do. The cleanup thread logic should probably be fixed too (no sleeping and polling... it should wait until notified that a cleanup is needed)

        Show
        Yonik Seeley added a comment - The ability to use a separate cleanup thread is interesting... but I'm not sure that having the ability to spin off a new thread for each cleanup is something one would ever want to do. The cleanup thread logic should probably be fixed too (no sleeping and polling... it should wait until notified that a cleanup is needed)
        Hide
        Noble Paul added a comment -

        OK that is a good idea. But this is an important functinality.

        Show
        Noble Paul added a comment - OK that is a good idea. But this is an important functinality.
        Hide
        Noble Paul added a comment -
        • cleanup thread does wait() and get notified when needed
        • ConcurrentLRUCache is generified
        • A new interface added to ConcurrentLRUCache called EvictionListener .This gets callback for each entry that is evicted
        • FastLRUcache has a new configuration 'cleanupThread' . default is set to 'false' . I believe it should be true by default
        Show
        Noble Paul added a comment - cleanup thread does wait() and get notified when needed ConcurrentLRUCache is generified A new interface added to ConcurrentLRUCache called EvictionListener .This gets callback for each entry that is evicted FastLRUcache has a new configuration 'cleanupThread' . default is set to 'false' . I believe it should be true by default
        Hide
        Noble Paul added a comment -

        made CacheEntry non static

        Show
        Noble Paul added a comment - made CacheEntry non static
        Hide
        Shalin Shekhar Mangar added a comment -

        Yonik, not trying to be pushy but can this patch be committed?

        I want to create a build for internal use out of trunk with this feature.

        Show
        Shalin Shekhar Mangar added a comment - Yonik, not trying to be pushy but can this patch be committed? I want to create a build for internal use out of trunk with this feature.
        Hide
        Yonik Seeley added a comment -

        Does this have a thread leak? Where is FastLRUCache.destroy() ever called?
        It's called from the finalizer (yuck), but that finalizer will never be called because the cleaning thread references the cache (the definition of liveness). Issues with having the cache deal with the thread lifecycle is why I previously recommended exploring the use of an Executor that the user passes in.

        Show
        Yonik Seeley added a comment - Does this have a thread leak? Where is FastLRUCache.destroy() ever called? It's called from the finalizer (yuck), but that finalizer will never be called because the cleaning thread references the cache (the definition of liveness). Issues with having the cache deal with the thread lifecycle is why I previously recommended exploring the use of an Executor that the user passes in.
        Hide
        Noble Paul added a comment -

        Yonik, Nice catch . There was a thread leak.

        I hope this patch fixes that. The cleanup thread now holds a WeakReference to the cache
        The close() of Solrcache ensures that it is destroyed.

        Show
        Noble Paul added a comment - Yonik, Nice catch . There was a thread leak. I hope this patch fixes that. The cleanup thread now holds a WeakReference to the cache The close() of Solrcache ensures that it is destroyed.
        Hide
        Yonik Seeley added a comment -

        Thanks Noble, looks like that solution should work.

        Funny thing with the latest patch though - I get compile errors with "ant test" from the command line:

        compile-common:
            [mkdir] Created dir: f:\code\solr\build\common
            [javac] Compiling 39 source files to f:\code\solr\build\common
            [javac] f:\code\solr\src\java\org\apache\solr\common\util\ConcurrentLRUCache.java:201: generic array creation
            [javac]       CacheEntry[] eset = new CacheEntry[sz];
            [javac]                           ^
            [javac] f:\code\solr\src\java\org\apache\solr\common\util\ConcurrentLRUCache.java:379: non-static class org.apache.solr.common.util.ConcurrentLRUCache.Cache
        Entry cannot be referenced from a static context
            [javac]       return ((CacheEntry)b).lastAccessedCopy < ((CacheEntry)a).lastAccessedCopy;
           [...]
        

        It looks like the compiler thinks that the method is static. IntelliJ doesn't flag any errors, and I can't see anything wrong after a quick glance at the code. Does "ant test" from the command line work for you?

        Show
        Yonik Seeley added a comment - Thanks Noble, looks like that solution should work. Funny thing with the latest patch though - I get compile errors with "ant test" from the command line: compile-common: [mkdir] Created dir: f:\code\solr\build\common [javac] Compiling 39 source files to f:\code\solr\build\common [javac] f:\code\solr\src\java\org\apache\solr\common\util\ConcurrentLRUCache.java:201: generic array creation [javac] CacheEntry[] eset = new CacheEntry[sz]; [javac] ^ [javac] f:\code\solr\src\java\org\apache\solr\common\util\ConcurrentLRUCache.java:379: non- static class org.apache.solr.common.util.ConcurrentLRUCache.Cache Entry cannot be referenced from a static context [javac] return ((CacheEntry)b).lastAccessedCopy < ((CacheEntry)a).lastAccessedCopy; [...] It looks like the compiler thinks that the method is static. IntelliJ doesn't flag any errors, and I can't see anything wrong after a quick glance at the code. Does "ant test" from the command line work for you?
        Hide
        Yonik Seeley added a comment -

        OK, committed latest patch after some minor logic changes (including changing CacheEntry from an inner class to a static inner class, which solved the compilation errors).

        Show
        Yonik Seeley added a comment - OK, committed latest patch after some minor logic changes (including changing CacheEntry from an inner class to a static inner class, which solved the compilation errors).
        Hide
        Noble Paul added a comment -
        • Should we keep the cleanupThread= true default for FastLRUCache?
        Show
        Noble Paul added a comment - Should we keep the cleanupThread= true default for FastLRUCache?
        Hide
        Yonik Seeley added a comment -

        I think the default should remain cleanupThread=false. It's simpler behavior, and for normal cache sizes, the pause an individual request may see is less than what one would see from a GC pause.

        Show
        Yonik Seeley added a comment - I think the default should remain cleanupThread=false. It's simpler behavior, and for normal cache sizes, the pause an individual request may see is less than what one would see from a GC pause.

          People

          • Assignee:
            Yonik Seeley
            Reporter:
            Noble Paul
          • Votes:
            2 Vote for this issue
            Watchers:
            6 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development