Issue Details (XML | Word | Printable)

Key: SOLR-665
Type: Improvement Improvement
Status: Closed Closed
Resolution: Duplicate
Priority: Major Major
Assignee: Unassigned
Reporter: Fuad Efendi
Votes: 0
Watchers: 6
Operations

If you were logged in you would be able to see more operations.
Solr

FIFO Cache (Unsynchronized): 9x times performance boost

Created: 28/Jul/08 03:24 AM   Updated: 21/Feb/09 03:31 PM
Return to search
Component/s: None
Affects Version/s: 1.3
Fix Version/s: None

Time Tracking:
Original Estimate: 672h
Original Estimate - 672h
Remaining Estimate: 672h
Remaining Estimate - 672h
Time Spent: Not Specified
Remaining Estimate - 672h

File Attachments:
  Size
Java Source File Licensed for inclusion in ASF works ConcurrentFIFOCache.java 2008-07-29 02:42 PM Noble Paul 2 kB
Java Source File Licensed for inclusion in ASF works ConcurrentFIFOCache.java 2008-07-29 06:26 AM Noble Paul 1 kB
Java Source File Licensed for inclusion in ASF works ConcurrentLRUCache.java 2008-07-29 09:34 AM Noble Paul 3 kB
Java Source File Licensed for inclusion in ASF works ConcurrentLRUWeakCache.java 2008-07-30 04:50 AM Fuad Efendi 2 kB
Java Source File Licensed for inclusion in ASF works FIFOCache.java 2008-07-28 03:25 AM Fuad Efendi 8 kB
Java Source File Licensed for inclusion in ASF works SimplestConcurrentLRUCache.java 2008-07-30 03:05 PM Fuad Efendi 2 kB
Environment: JRockit R27 (Java 6)

Resolution Date: 21/Feb/09 03:31 PM


 Description  « Hide
Attached is modified version of LRUCache where
1. map = new LinkedHashMap(initialSize, 0.75f, false) - so that "reordering"/true (performance bottleneck of LRU) is replaced to "insertion-order"/false (so that it became FIFO)
2. Almost all (absolutely unneccessary) synchronized statements commented out

See discussion at http://www.nabble.com/LRUCache---synchronized%21--td16439831.html

Performance metrics (taken from SOLR Admin):

LRU
Requests: 7638
Average Time-Per-Request: 15300
Average Request-per-Second: 0.06

FIFO:
Requests: 3355
Average Time-Per-Request: 1610
Average Request-per-Second: 0.11

Performance increased 9 times which roughly corresponds to a number of CPU in a system, http://www.tokenizer.org/ (Shopping Search Engine at Tokenizer.org)

Current number of documents: 7494689
name: filterCache
class: org.apache.solr.search.LRUCache
version: 1.0
description: LRU Cache(maxSize=10000000, initialSize=1000)
stats: lookups : 15966954582
hits : 16391851546
hitratio : 0.102
inserts : 4246120
evictions : 0
size : 2668705
cumulative_lookups : 16415839763
cumulative_hits : 16411608101
cumulative_hitratio : 0.99
cumulative_inserts : 4246246
cumulative_evictions : 0

Thanks



 All   Comments   Work Log   Change History   Subversion Commits      Sort Order: Ascending order - Click to sort in descending order
Fuad Efendi added a comment - 28/Jul/08 03:29 AM
I renamed to FIFOCache just before opening an issue; in a local system it is (modified) LRUCache so that filterCache has reference to 'old' name...

Fuad Efendi added a comment - 28/Jul/08 04:34 AM
Almost forgot: I am estimating performance gains basing on real application in-production, multithreaded, Tomcat 6.0.16 & JRockit R27 (Java 6) & SLES10 & two quad-core Opteron 2350 (8 CPUs total) & 25Gb for SOLR...
And, first queries run a long, more than a minute (warming up caches with faceted query id:[* TO *]) average Time-Per-Request decreases over time and it is now 1591.5232 giving 10x performance boost.
Facets are highly distributed as you can see from website and filterCache... HTTP caching is supported - to see real timing you should execute real queries...
ConcurrentHashMap is not applicable - we are not modifying cached item indeed... FIFO is without 'Out' if we have enough memory.

Yonik Seeley added a comment - 28/Jul/08 01:29 PM
This implementation isn't thread safe (due the the removed synchronization).

Noble Paul added a comment - 28/Jul/08 01:37 PM
Why do we have almost the entire get() method synchronized on the map in LRUCache .
stats increment can possibly be put out of synchronized

Yonik Seeley added a comment - 28/Jul/08 01:52 PM

Why do we have almost the entire get() method synchronized on the map in LRUCache .

It simplified the code and reduced the number of branches. What's your proposed version?
Here is the current method:

public Object get(Object key) {
    synchronized (map) {
      Object val = map.get(key);
      if (state == State.LIVE) {
        // only increment lookups and hits if we are live.
        lookups++;
        stats.lookups.incrementAndGet();
        if (val!=null) {
          hits++;
          stats.hits.incrementAndGet();
        }
      }
      return val;
    }
  }

Noble Paul added a comment - 28/Jul/08 02:06 PM - edited
I guess this is safe and faster
public Object get(Object key) {
    synchronized (map) {
      Object val = map.get(key);
      if (state == State.LIVE) {
        // only increment lookups and hits if we are live.
        lookups++;
      }
    }
      stats.lookups.incrementAndGet();
      if (val != null) {
        //hits++; this must be removed
        stats.hits.incrementAndGet();
      }
      return val;
  }

let us remove the field hits and use stats.hits wherever we need it


Yonik Seeley added a comment - 28/Jul/08 02:35 PM - edited
Your version changes what the method does in a couple of respects though.

let us remove the field hits and use stats.hits wherever we need it

stats.hits is for all caches of this type (it's shared across searchers). hits is local to this cache instance.


Fuad Efendi added a comment - 28/Jul/08 02:56 PM
Regarding Thread Safety:
  • yes, we need synchronized get() method for LRU cache because each access reorders LinkedHashMap.
  • absolutely no need to synchronize get() method for FIFO!
  • probably we need to deal with insertion, but do not synchronize it: instead, extend LinkedHashMap and make some 'removal' synchronized... With caches large enough to store all object we do not need it. We probably do not need to synchronize 'removal' at all - it removes entry but does not remove/finalize referenced object.

From JavaDoc: "Note that this implementation is not synchronized. If multiple threads access a linked hash map concurrently, and at least
one of the threads modifies the map structurally, it must be synchronized externally."
However, we do not modify cache structurally during iteration loop or any other 'structure' access (we do not use Iterator!) - so, advice from JavaDoc is not applicable.

We should synchronize only removeEntryForKey of HashMap; unfortunately we can't do it without rewriting HashMap. Probably we can use ConcurrentHashMap as a base class of LinkedHashMap, but I don't know answer yet... I am guessing that unsynchronized entry removal won't be significant issue in multithreaded environment.


Yonik Seeley added a comment - 28/Jul/08 03:10 PM

absolutely no need to synchronize get() method for FIFO!

A cache is not a read-only object. Gets need to be synchronized because other threads can be changing the cache.
If anyone wants to learn more about thread safety and concurrency, I'd recommend "Java concurrency in practice"
http://www.javaconcurrencyinpractice.com/


Fuad Efendi added a comment - 28/Jul/08 03:13 PM
  • Classes from java.util.concurrent.atomic designed NOT to be synchronized, per-instance stats should be replaced to AtomicLong instead of long:
    {{ private long lookups;
    private long hits;
    private long inserts;
    private long evictions;}}
  • get() method of FIFO do not need any synchronization
  • get() method of LRU reorers LinkedHashMap, unsynchronized access may cause orphan Entry objects
  • synchronized insertion for FIFO won't cause performance degradation because get() is unsynchronized.

Yonik Seeley added a comment - 28/Jul/08 03:18 PM

per-instance stats should be replaced to AtomicLong instead of long.

Since we need to synchronize anyway, it's more efficient to just stick things like hits++ inside the sync block instead of having a separate AtomicLong.


Fuad Efendi added a comment - 28/Jul/08 03:19 PM

bq. absolutely no need to synchronize get() method for FIFO!

A cache is not a read-only object. Gets need to be synchronized because other threads can be changing the cache.

I am familiar with Doug Lea's findings (he wrote his book in 1996, and it became part of java.util.concurrent after 10(!!!) years).

changing the cache

  • what is 'cache change'?
    Changing the stored Key, changing the referenced object - never ever happens in SOLR. Removal of object - yes. More correctly: removal of key. get(MyKey) will return null OR will return MyValue, and "ConcurentCacheModification" will never be thrown in SOLR (we are not using Iterator!). We can insert concurrently the same (key, value) pairs - not a problem.

Yonik Seeley added a comment - 28/Jul/08 03:30 PM

what is 'cache change'?

Adding a new key/value pair to the cache, or removal of a key/value pair from the cache.

get(MyKey) will return null OR will return MyValue, and "ConcurentCacheModification" will never be thrown in SOLR

This would only be under very specific usage patterns - everything pre-cached, and no adds or removes once the cache is "LIVE" (accessed by more than one thread concurrently).
ConcurentModification is best effort, not guaranteed. You can still get incorrect results or other exceptions from code that isn't thread safe, even when a ConcurentModification isn't thrown.


Sean Timm added a comment - 28/Jul/08 03:33 PM
Fuad--

I agree with Yonik here. From the HashMap Javadoc:

If multiple threads access this map concurrently, and at least one of the threads modifies the map structurally, it must be synchronized externally. (A structural modification is any operation that adds or deletes one or more mappings; merely changing the value associated with a key that an instance already contains is not a structural modification.)


Fuad Efendi added a comment - 28/Jul/08 04:05 PM

If multiple threads access this map concurrently, and at least one of the threads modifies the map structurally, it must be synchronized externally. (A structural modification is any operation that adds or deletes one or more mappings; merely changing the value associated with a key that an instance already contains is not a structural modification.)

I already commented it. Just try to avoid 'books' and 'authors', also try to find meaning in JavaDocs and try to browse JavaSource instead...

  • structural modification, related ConcurrentModificationException, and related Iterator: not applicable to SOLR Cache.

Will never happen. We need to synchronize inserts because each insert may calculate size and remove 'eldest' entry, and we need to avoid OOMs. We need to synchronize retrieves for LRU because 'Linked' HashMap (with Access Order) will change links (will reorder Map). And we do not need to synchronize retrieves from Insertion-Ordered LinkedHashMap (FIFO). It is classic... i'd like to research more java.util.concurrent


Yonik Seeley added a comment - 28/Jul/08 04:22 PM
Closing, since it's pretty much impossible to avoid all forms of synchronization and memory barriers for get() on a shared Map object.

Fuad Efendi added a comment - 28/Jul/08 04:43 PM
Why? We have PoC at least, and we know the bottleneck!
We need improvement: avoid some synchronization; it is extremely easy with FIFO. We may try to improve LRU too.
Not everything in JAVA is extremely good: for instance, synchronization. Even for single-threaded application, it needs additionally 600 CPU cycles (I've read it somewhere for SUN Java 1.3 on Windows)

Yonik, please allow some time to think / to discuss.

I'll try also to provide 'concurrent' LRU; but this issue is FIFO related.


Fuad Efendi added a comment - 28/Jul/08 04:50 PM
get() method do not need to be synchronized for FIFO cache. Unsynchronized object retrieval is not structural modification of Insertion-Ordered LinkedHashMap. Unsynchronized cache improves performance of multithreaded applications linear to number of CPUs.

Noble Paul added a comment - 28/Jul/08 05:25 PM
OK.Let us try to study the failure cases .
get() does no state modification.

So if the get is unsynchronized the worst case scenario is that we may get an Exception (I dunno what ConcurrentModificationException? ArrayIndexOutOfBoundsException? ). How often does it happen? What is the probability?

What is the big deal ?(the system won't crash). We can catch the Exception and return null as if the entry was not found . That means that Solr may have to recompute results where it did not have to (this is a cost).

here we totally eliminated the cost of synchronization (benefit). I guess if you do a cost benefit analysis this does not turn out to be as bad as it is projected to be.

And after all the user is knowingly using this fully aware of the cost. is there anything else I have not considered


Fuad Efendi added a comment - 28/Jul/08 05:27 PM

...and at least one of the threads modifies the map structurally, it <i>must</i> be synchronized externally.

  • so that thread running on insert must be synchronized and not thread running on retrieve. Again, we need synchronize insert only to avoid memory leaks, not more. SOLR does not iterate Map.

Fuad Efendi added a comment - 28/Jul/08 05:30 PM
ConcurrentModificationException is thrown only when we iterate Map and another thread modified structure; with LRU each get() modifies structure, with FIFO - only inserts...

Shalin Shekhar Mangar added a comment - 28/Jul/08 05:42 PM
I think Fuad has a point. In an Insertion ordered LinkedHashMap, get makes no structural modification and if we synchronize put/remove, we should be fine. The cache warming is already thread-safe and we don't have iterators anywhere. Am I missing something here?

Mark Miller added a comment - 28/Jul/08 06:41 PM
Sent to the mailing List:

>>Depends on if you want to go by the javadoc or not - it says that if any of the threads accessing the map concurrently make a structural change, the map must be synchronized. It's clear that get does not >>make a structural change when using insertion order, but can any other threads possibly make a mapping change while calling get? Sounds like yes, and so the contract would seem to indicate you >>synchronize...

>>Put can modify shared variables that get accesses - sounds like dangerous ground to me. If it works, its got to be sneaky enough to warrant code smell...

>>- Mark

Further:

Here is just one of possibly many problems - a put call can resize and make an entirely new table array. A get call can do something like the following with the table array:

table[indexFor(hash, table.length)

Do to execution reordering / memory barriers, would seem to me that the table being indexed into and the table.length may not be the values you were hoping for...


Mike Klaas added a comment - 28/Jul/08 06:58 PM
I haven't looked at the proposed code at all, but it is possible to design this kind of datastructure, with much care:

http://www.ddj.com/hpc-high-performance-computing/208801974


Shalin Shekhar Mangar added a comment - 28/Jul/08 07:00 PM
Yes, I'm sorry for speaking out of turn here.

Noble Paul added a comment - 28/Jul/08 07:03 PM
OK . I take my words back. LinkedHashMap is backed a by a HashMap . The table can get resized during the lifetime of a map (and hence during a get operation also). So the get() may return a wrong value (which is unacceptable) and it may never throw an Exception.

If we ever want to have an LRUCache with a non -synchronized get() we must have it backed by a ConcurrentHashMap


Fuad Efendi added a comment - 28/Jul/08 07:05 PM - edited
Mark, thanks for going deeper. Yes, resize may change ( Entry[ ] ) table, key will disappear from bucket and _get() returns null:
/**
     * The table, resized as necessary. Length MUST Always be a power of two.
     */
    transient Entry[] table;
...
    public V get(Object key) {
	if (key == null)
	    return getForNullKey();
        int hash = hash(key.hashCode());
        for (Entry<K,V> e = table[indexFor(hash, table.length)];
             e != null;
             e = e.next) {
            Object k;
            if (e.hash == hash && ((k = e.key) == key || key.equals(k)))
                return e.value;
        }
        return null;
    }

Might get null instead of MyValue (during resize by another thread); not a big deal! It is still Thread-Safe. Will add new entry (overwrite existing) in such extremely rare cases.

  • get() will never return wrong value (Entry object is immutable in our case):
    if (e.hash == hash && ((k = e.key) == key || key.equals(k)))
                    return e.value;

Shalin Shekhar Mangar added a comment - 28/Jul/08 07:13 PM
Fuad, before going further on this, we must also consider users who use alternate JVM implementations. Since the contract is to synchronize, we cannot base our decision on Sun JDK's implementation.

Mark Miller added a comment - 28/Jul/08 07:15 PM
I don't have time to look at this right now, but I'll certainly spend some time tonight Efendi.

A quick to my point though: I mentioned something that might be done in an implementation that could be an issue - you came back with a specific implementation that may (or may not, I don't have time at the moment) not work anyway. We can't program to an implementation though - we must program to the contract.


Yonik Seeley added a comment - 28/Jul/08 07:21 PM
resize() is only the most obvious problem... there are tons of ways this can fail, even if you get around the resize (and your "null" fix for resize() won't work). Some of the failures can be in the form of incorrect results rather than null pointers or exceptions (so you can't just retry).

I'll reiterate:

it's pretty much impossible to avoid all forms of synchronization and memory barriers for get() on a shared Map object.


Fuad Efendi added a comment - 28/Jul/08 07:21 PM
Shalin, we are already using AtomicLong of Java 5; JVM is defferent story... JRockit R27 is JVM from BEA-Oracle, and its JDK 6 powered (rt.jar comes from SUN).
I just tried to compare synchronized with unsynchronized and found it is the main problem for faceting...
Another problem: somehow faceting recalculates each time (using filterCache during repeated recalculations), queryCache is not enough...

Fuad Efendi added a comment - 28/Jul/08 07:31 PM
We need to invite Doug Lea to this discussion...
http://en.wikipedia.org/wiki/Doug_Lea
http://gee.cs.oswego.edu/dl/index.html

We may simply use java.util.concurrent.locks instead of heavy synchronized... we may also use Executor framework instead of single-thread faceting... We may even base SOLR on Apache MINA project.


Shalin Shekhar Mangar added a comment - 28/Jul/08 07:37 PM

I just tried to compare synchronized with unsynchronized and found it is the main problem for faceting...

Fuad, I am completely with you on this. Everybody will agree that a more efficient implementation will be a very useful addition to Solr. However, Yonik's concerns on the current patch are valid and we cannot go ahead with the current one.


Fuad Efendi added a comment - 28/Jul/08 07:45 PM - edited
concerns are probably because of misunderstanding of some contract...
/**
     * The table, resized as necessary. Length MUST Always be a power of two.
     */
    transient Entry[] table;

    void resize(int newCapacity) {
        Entry[] oldTable = table;
        int oldCapacity = oldTable.length;
        if (oldCapacity == MAXIMUM_CAPACITY) {
            threshold = Integer.MAX_VALUE;
            return;
        }

        Entry[] newTable = new Entry[newCapacity];
        transfer(newTable);
        table = newTable;
        threshold = (int)(newCapacity * loadFactor);
    }


    public V get(Object key) {
	if (key == null)
	    return getForNullKey();
        int hash = hash(key.hashCode());
        for (Entry<K,V> e = table[indexFor(hash, table.length)];
             e != null;
             e = e.next) {
            Object k;
            if (e.hash == hash && ((k = e.key) == key || key.equals(k)))
                return e.value;
        }
        return null;
    }
  • in worst case we will have pointer to old table and even with new one of smaller size we won't get any ArrayIndexOutOfBounds.
  • There is no any contract requiring synchronization on get() of HashMap or LinkedHashMap; it IS application specific.
  • we will never have wrong results because Entry is immutable
/** 
     * Transfer all entries from current table to newTable.
     */
    void transfer(Entry[] newTable) {
        Entry[] src = table;
        int newCapacity = newTable.length;
        for (int j = 0; j < src.length; j++) {
            Entry<K,V> e = src[j];
            if (e != null) {
                src[j] = null;
                do {
                    Entry<K,V> next = e.next;
                    int i = indexFor(e.hash, newCapacity);  
                    e.next = newTable[i];
                    newTable[i] = e;
                    e = next;
                } while (e != null);
            }
        }
    }
  • We won't have even any NullPointerException after src[j] = null.

P.S.
Of course, I agree - it is Java internals, and it is not public Map interface-contract - should we avoid to use implementation then? I believe it is specified somewhere in JSR too...

* @author  Doug Lea
 * @author  Josh Bloch
 * @author  Arthur van Hoff
 * @author  Neal Gafter
 * @version 1.65, 03/03/05

P.P.S.
Do not forget to look at the top of this discussion:

description: xxx Cache(maxSize=10000000, initialSize=1000) 
size : 2668705
cumulative_inserts : 4246246
  • cumulative_inserts is almost double of size which shows that double-inserts are real
  • I checked catalina_out: no any NullPointerException, ArrayIndexOutOfBoundsException, and etc.
  • I don't think we should be worried too much about possible change of Map implementation by SUN ... in this case we should use neither java.lang.String nor java.util.Date (some are placed in wrong packages).
  • about thread safety: some participants are wrongly guessing that making object access totally synchronized will make their code thread-safe... deadlock.

Mike Klaas added a comment - 28/Jul/08 09:16 PM
[quote]We may simply use java.util.concurrent.locks instead of heavy synchronized... we may also use Executor framework instead of single-thread faceting... We may even base SOLR on Apache MINA project.[/quote]

Simply replacing synchronized with java.util.concurrent.locks doesn't increase performance. There needs to be a specific strategy for employing these locks in a way that makes sense.

For instance, one idea would be to create a read/write lock with the put()'s covered by write and get()'s covered by read. This would allow multiple parallel reads and will be thread-safe. Another is to create something like ConcurrentLinkedHashMap.

These strategies should be tested before trying to create a lock-free get() version, which if even possible, would rely deeply on the implementation (such a structure would have to be created from scratch, I believe). I'd expect anyone that is able to create such a thing be familiar enough wiht memory barriers and such issues to be able to deeply explain the problems with double-checked locking off the top of their head (and immediately see such problems in other code)


Fuad Efendi added a comment - 28/Jul/08 09:34 PM

Simply replacing synchronized with java.util.concurrent.locks doesn't increase performance. There needs to be a specific strategy for employing these locks in a way that makes sense.

I absolutely agree...

ConcurrentHashMap is based on some level of acceptable safety, for specific tasks only

They do not throw ConcurrentModificationException. However, iterators are designed to be used by only one thread at a time.

We can try to design specific caches directly implementing Map or ConcurrentMap interfaces. We should define 'safety' levels (for instance, null is not a problem if cache already contains this object added by another thread concurrently; cache elements are explicitly immutable objects; and etc.)
FIFO looks simplest and it does work indeed; for LRU we need reordering for each get(), OR we can make it weaker using weak (approximate) reordering somehow...


Fuad Efendi added a comment - 29/Jul/08 01:19 AM - edited
This is extremely simple Concurrent LRU, I spent an hour to create it; it is based on ConcurrentHashMap. I don't use java.util.concurrent.locks, and I am trying to focus on requirements only avoiding implementing unnecessary methods of Map interface (so that I am not following contract very sorry!)
import java.util.Map;
import java.util.concurrent.ConcurrentHashMap;

public class ConcurrentLRU<K, V> {

	protected ConcurrentHashMap<K, ValueWrapper<V>> map;
	protected int maxEntries;

	public ConcurrentLRU(int maxEntries) {
		map = new ConcurrentHashMap<K, ValueWrapper<V>>();
		this.maxEntries = maxEntries;
	}

	public V put(K key, V value) {
		ValueWrapper<V> wrapper = map.put(key, new ValueWrapper<V>(value));
		checkRemove();
		return value;
	}

	void checkRemove() {
		if (map.size() <= maxEntries)
			return;
		Map.Entry<K, ValueWrapper<V>> eldestEntry = null;
		for (Map.Entry<K, ValueWrapper<V>> entry : map.entrySet()) {
			long popularity = entry.getValue().popularity;
			if (eldestEntry == null || eldestEntry.getValue().popularity > popularity) {
				eldestEntry = entry;
			}
		}
		map.remove(eldestEntry.getKey(), eldestEntry.getValue());
	}

	public V get(Object key) {
		ValueWrapper<V> wrapper = map.get(key);
		return wrapper == null ? null : wrapper.getValue();
	}

	public final static class ValueWrapper<V> {
		static volatile long popularityCounter;
		volatile long popularity;
		V value;

		ValueWrapper(V value) {
			this.value = value;
			popularity = popularityCounter++;
		}

		public boolean equals(Object o) {
			if (!(o instanceof ValueWrapper)) {
				return false;
			}
			return (value == null ? ((ValueWrapper) o).value == null : value.equals(((ValueWrapper) o).value));
		}

		public int hashCode() {
			return value.hashCode();
		}

		public V getValue() {
			popularity = popularityCounter++;
			return value;
		}

	}

}

P.S.
Do we need to synchronize put() or checkRemove()? The only hypothetical problem is OutOfMemoryException. But it is just first draft, very simplified... we do not need to sort array.


Yonik Seeley added a comment - 29/Jul/08 01:40 AM
Fuad, you are on the right track now by using something that is thread-safe (ConcurrentHashMap). A couple of minor points:
  • I don't see the point of the static popularityCounter... that looks like a bug.
  • increments aren't atomic, but losing an increment once in a while should be fine in this scenario

Anyway, If this works for you, use it!
It's likely to be very specific to your use-case though (with millions of cache entries, millions of lookups per request, and no evictions). The eviction code looks like it would be relatively expensive.


Lars Kotthoff added a comment - 29/Jul/08 01:49 AM

Not everything in JAVA is extremely good: for instance, synchronization. Even for single-threaded application, it needs additionally 600 CPU cycles (I've read it somewhere for SUN Java 1.3 on Windows)

That's probably not true for modern JVMs though – cf. http://www.ibm.com/developerworks/library/j-jtp04223.html.

...9x times performance boost...

How did you measure that exactly? The Solr admin pages will not give you exact measurements. Could you describe the test setup in detail? I'm guessing that you're caching the results of all queries in memory such that no disk access is necessary. Are you using highlighting or anything else that might be CPU-intensive at all? From my personal experience with Solr I wouldn't expect synchronization for the caches to be that big of a performance penalty. In some of my tests with a several GB index where all results where cached and highlighting was turned on I've seen throughputs in excess of 400 searches per second. I think that the performance bottleneck in this case was the network interface for sending the replies.

absolutely no need to synchronize get() method for FIFO!

Consider the following case: thread A performs a synchronized put, thread B performs an unsynchronized get on the same key. B gets scheduled before A completes, the returned value will be undefined. Yes, we could do sanity checks to minimise these cases, but that would probably end up being more expensive than the synchronization.

From JavaDoc: "Note that this implementation is not synchronized. If multiple threads access a linked hash map concurrently, and at least

one of the threads modifies the map structurally, it must be synchronized externally."

That's exactly the case here – the update thread modifies the map structurally! It doesn't do this at all times, probably even never after the cache has been populated, but there's no way to know for sure unless you explicitely remove the put method.

I'm not convinced that we should change the current implementation for the following reasons:

  • Concurrency is traditionally a discipline which is very hard to get right. Furthermore the serious bugs tend to show up only when you really get race conditions and the like, i.e. when the machine is under heavy load and any disruption will hit you seriously.
  • You've already started to amend your implementation with sanity checks and the like – as I've said before, this might end up being more expensive than synchronization.
  • A FIFO cache might become a bottleneck itself – if the cache is very large and the most frequently accessed item is inserted just after the cache is created, all accesses will need to traverse all the other entries before getting that item.

That said, if you can show conclusively (e.g. with a profiler) that the synchronized access is indeed the bottleneck and incurs a heavy penalty on performance, then I'm all for investigating this further.


Fuad Efendi added a comment - 29/Jul/08 02:19 AM - edited

The Solr admin pages will not give you exact measurements.

Yes, and I do not need exact measurements! It gives me averageTimePerRequest which improved almost 10 times on production server. Should I right JUnit tests and execute it in a single-threaded environment? Better is to use The Grinder, but I don't have time and spare CPUs.

I've seen throughputs in excess of 400 searches per second.

But 'searches per second' is not the same as 'average response time'!!!

Are you using highlighting or anything else that might be CPU-intensive at all?

Yes, I am using highlighting. You can see it at http://www.tokenizer.org

I'm guessing that you're caching the results of all queries in memory such that no disk access is necessary.

But this is another bug of SOLR!!! I am using extremely large caches but SOLR still recalculates facet intersections. - SOLR-669

A FIFO cache might become a bottleneck itself - if the cache is very large and the most frequently accessed item is inserted just after the cache is created, all accesses will need to traverse all the other entries before getting that item.

  • sorry, I didn't understand... yes, if cache contains 100000 entries and 'most popular item' is removed... Why 'traverse all the other entries before getting that item'? why 99999 items are less popular (cumulative) than single one (absolute)?

You probably mean 'LinkedList traversal' but this is not the case. This is why we need to browse JavaSource... LinkedHashMap extends HashMap and there is no any 'traversal',

public V get(Object key) {
        Entry<K,V> e = (Entry<K,V>)getEntry(key);
        if (e == null)
            return null;
        e.recordAccess(this);
        return e.value;
    }

Consider the following case: thread A performs a synchronized put, thread B performs an unsynchronized get on the same key. B gets scheduled before A completes, the returned value will be undefined.

the returned value is well defined: it is either null or correct value.

That's exactly the case here - the update thread modifies the map structurally!

Who cares? We are not iterating the map!

That said, if you can show conclusively (e.g. with a profiler) that the synchronized access is indeed the bottleneck and incurs a heavy penalty on performance, then I'm all for investigating this further.

What?!!

Anyway, I believe simplified ConcurrentLRU backed by ConcurrentHashMap is easier to understand and troubleshoot...

I don't see the point of the static popularityCounter... that looks like a bug.

No, it is not a bug. it is virtually "checkpoint", like as a timer, one timer for all instances. We can use System.currentTimeMillis() instead, but static volatile long is faster.

About specific use case: yes... if someone has 0.5 seconds response time for faceted queries I am very happy... I had 15 seconds before going with FIFO.


Fuad Efendi added a comment - 29/Jul/08 02:25 AM

The eviction code looks like it would be relatively expensive

but get() method of LinkedHashMap reorders whole map!!! (Of course, CPU load is evenly distributed between several get() so that we can't see it) Other implementations even use Arrays.sort() or something similar. I don't see easier solution than that... probably some random-access policy with predictable range of "popularity", we can evict anything 'old' and not necessarily 'eldest'...


Yonik Seeley added a comment - 29/Jul/08 02:31 AM

but get() method of LinkedHashMap reorders whole map!!!

No it doesn't... think linked-list. It moves a single item, which is pretty fast.


Fuad Efendi added a comment - 29/Jul/08 02:31 AM - edited
Lars, I used FIFO because it is extremely simple to get unsynchronized get():
map = new LinkedHashMap(initialSize, 0.75f, true)  - LRU Cache
(and we need synchronized get())
map = new LinkedHashMap(initialSize, 0.75f, false) - FIFO
(and we do not need synchronized get())

Yonik, I'll try to improve ConcurrentLRU and to share findings... of course FIFO is not what we need.

No it doesn't... think linked-list. It moves a single item, which is pretty fast.

yes, so I wrote 'evenly distributed between several get() so we can't see it' - it keeps List ordered and we can't unsynchronize it with all subsequences!!!


Lars Kotthoff added a comment - 29/Jul/08 02:31 AM

the returned value is well defined: it is either null or correct value.

It is if nothing is modifying the map during the get. If something is modifying the map you don't know how the implementation handles the insert of a new value. It might copy the object, and you'd end up with half an object or even an invalid memory location. That's why the javadoc says that you must synchronize accesses if anything modifies the map – this is not limited to iterators.


Yonik Seeley added a comment - 29/Jul/08 02:43 AM

the returned value is well defined: it is either null or correct value.

No, it is not! Your analysis seems to ignore the java memory model (partially constructed objects and all that). I don't know how many different ways to say it.... please do yourself a favor and read up on the java memory model (and the book I previously referenced is great for this). This is hard stuff (at the lowest levels).


Fuad Efendi added a comment - 29/Jul/08 02:45 AM

It is if nothing is modifying the map during the get. If something is modifying the map you don't know how the implementation handles the insert of a new value. It might copy the object, and you'd end up with half an object or even an invalid memory location. That's why the javadoc says that you must synchronize accesses if anything modifies the map - this is not limited to iterators.

JavaDoc does not say that. Instead, it says (I am repeating):

..and at least one of the threads modifies the map structurally, it must be synchronized externally.

  • only thread doing structural modification must be synchronized. In case of LinkedHashMap, for instance, we need to synchronize inserts in order to avoid Entry instances referencing themselves (orphans).

you don't know how the implementation handles the insert of a new value

I know exactly: SOLR does not modify 'value' during 'insert', Map.Entry instances are immutable in SOLR, etc. Table resize is main problem - but after analyzing source code I don't see any problem. Consern that 'wrong value will be returned for a key' is not applicable. And JavaDocs does not say anything about that. Collections internally use Map.Entry in an immutable way, do not change it.


Fuad Efendi added a comment - 29/Jul/08 02:55 AM

No, it is not! Your analysis seems to ignore the java memory model (partially constructed objects and all that). I don't know how many different ways to say it.... please do yourself a favor and read up on the java memory model (and the book I previously referenced is great for this). This is hard stuff (at the lowest levels).

Ok. May be I can get reference to wrong object type, or even object scheduled for finalization... But we are not inserting into Map 'partially constructed objects', isn't it?

Simplest scenario: Thread A tries to get variable (4 bytes of address in JVM) pointing to object O. Another thread B concurrently assigns null to that variable. Isn't it solved at CPU level yet? Or, may be on 64bit system thread B assigns zero to first 2 bytes, and then to another 2 bytes?

I need to study this book... BTW, I am running JVM with '-server' option.


Lars Kotthoff added a comment - 29/Jul/08 02:56 AM
The way I understand the javadoc is

..and at least one of the threads modifies the map structurally, it must be synchronized externally.

The it references the map. It says explicitely that the map must be synchronized, i.e. all operations on it. It does not say that only the thread modifying it must be synchronized. Consider the case where only one thread modifies the map. You're saying that in this case synchronization would not be necessary, as only modifications themselves need to be synchronized and there is only one thread doing that. The javadoc explicitely says that it is necessary.


Fuad Efendi added a comment - 29/Jul/08 03:07 AM

The it references the map. It says explicitely that the map must be synchronized.

  • I agree, thanks for pointing to it. Synchronize!

BTW, Joshua Bloch developed Arrays.sort(), and bug was found after 9 years. Nothing is perfect.

ConcurrentLRU looks extremely simple and easy to improve. Should we check SUN's bug database before using ConcurrentHashMap? It has some related...


Craig McClanahan added a comment - 29/Jul/08 03:44 AM
Just a quick comment from the Peanut Gallery.

Instead of arguing about what the semantics of what the JavaDocs for java.util.Map and friends actually say, and worrying about whether particular implementations actually follow the rules but provide reasonable performance, this seems like a good opportunity to design and build an application specific data structure that has the behavioral characteristics you want (Fuad is after fastest-possible reads, everybody is after reasonable behavior in the face of concurrent writes). My only recommendation in this regard would be this: don't make a custom implementation say "implements java.util.LinkedHashMap" (again, or whatever implementation is currently in use) unless it does actually implement the documented semantics for java.util.LinkedHashMap.

It's fine to have a custom FifoCacheMap (or whatever name you like) class that does not implement java.util.Map. It's not fine to say you implement an interface but then break the documented contract.


Yonik Seeley added a comment - 29/Jul/08 03:58 AM

It's fine to have a custom FifoCacheMap (or whatever name you like) class that does not implement java.util.Map.

Right. Any solr cache must currently implement the SolrCache interface, and LinkedHashMap is simply an implementation detail of the LRUCache implementation of SolrCache (which has no relationship to the Map interface).


Fuad Efendi added a comment - 29/Jul/08 04:05 AM - edited

Fuad is after fastest-possible reads, everybody is after reasonable behavior in the face of concurrent writes

Thanks and sorry for runtime errors;

FIFO looks strange at first, but... for large cache (100000 items), most popular item can be mistakenly removed... but I don't think there are any 'most popular facets' etc.; it's evenly distributed in most cases.

Another issue: SOLR always tries recalculate facets even with extremely large filterCache & queryResultCache, even the same faceted query shows always the same long response times.

It is if nothing is modifying the map during the get. If something is modifying the map you don't know how the implementation handles the insert of a new value. It might copy the object, and you'd end up with half an object or even an invalid memory location. That's why the javadoc says that you must synchronize accesses if anything modifies the map - this is not limited to iterators.

I agree of course... However, we are not dealing with unknown implementation of java.util.Map clonig (java.lang.Cloneable) objects somehow or using some weird object introspection etc....


Fuad Efendi added a comment - 29/Jul/08 04:54 AM
BTW there is almost no any functional difference between LRU and FIFO. And there is huge difference between LRU (Least Recently Used) and LFU (Least Frequently Used).
It's easy to implement ConcurrentLFU based on provided ConcurrentLRU template; of course, following the main contract org.apache.solr.search.SolrCache.

Noble Paul added a comment - 29/Jul/08 04:59 AM
I Fuad. This is a worthwhile effort to undertake. Why don't you implement a LRUCache implementation and post it (as a java file) . Let everyone review and see if it can be improved

Fuad Efendi added a comment - 29/Jul/08 05:08 AM
Paul, I want to do it... understand me, since February I am having constant performance problems with faceted queries (15-20 seconds response time), I ordered new server with 32 Gb & 2x quad-core (8x times more power!) but it didn't improve performance; finally I commented sync in LRUCache and made it FIFO... I was very impatient with this post, just tried to share very real staff...

Noble Paul added a comment - 29/Jul/08 06:26 AM
A FIFOCache using ConcurrentHashMap.
Please give feedback

Noble Paul added a comment - 29/Jul/08 09:34 AM
Another one this is LRU. uses ConcurrentHashMap again
  • Gets are free
  • Puts are expensive (very expensive) after the high watermark . Till then it is free
  • To lighten the load on put an extra thread is employed to do a concurrent mark and sweep
  • there is a high-water-mark and a low-water-mark . The extra thread cleans anything if low-water-mark is crossed. Put must removes if level crosses high-water-mark

Noble Paul added a comment - 29/Jul/08 02:42 PM
bug fix

Fuad Efendi added a comment - 30/Jul/08 04:32 AM
Thanks Paul, I am always trying to simplify... AtomicReference is a pointer to (approximately) least recently used 'key'...

Fuad Efendi added a comment - 30/Jul/08 04:40 AM
bug fix

Fuad Efendi added a comment - 30/Jul/08 04:50 AM
another bug... and AtomicReference is generic... never used it before. Could be even 'weaker' if we use 'hashcode' which is long (and atomic) instead of Key which is object (and unsafe), and distribution of hashcode is ok...

Noble Paul added a comment - 30/Jul/08 05:04 AM - edited
There are a few obvious issues w/ your patch.
Effendi your implementation is wrong
if (wrapper.lastAccessed < pointer.lastAccessed) {
			ref.set(new Pointer(key, wrapper.lastAccessed));
}

will always evaluate to false. And the reference will always have one value
We must be removing the entry which was accessed first (not last).. To get that you must maintian a linkedlist the way linkedhashmap maintains. No other shortcut. Keeping one reference will not work.

In that implementation there is always a contention on the head so gets are bound to be slow. That is why i did not go that route
And the static volatile counter is not threadsafe.
static in general is not a good idea

There is no need to use a WeakReference anywhere. This is a cache. So it must maintain strong reference


Noble Paul added a comment - 30/Jul/08 06:21 AM
I have opened SOLR-667 for LRU implementations

Fuad Efendi added a comment - 30/Jul/08 02:01 PM - edited
Noble, thanks for feedback!

Of course my code is buggy but I only wanted to illustrate simplest idea; I am extremely busy with other staff (Liferay) and can't focus on SOLR improvements... may be during weekend.

...will always evaluate to false. And the reference will always have one value

  • yes, this is bug. There are other bugs too...

We must be removing the entry which was accessed first (not last)..

I mean (and code too) the same; probably wrong wording

And the static volatile counter is not threadsafe.

Do we really-really need thread safety here? By using 'volatile' I only prevent some JVMs from trying to optimize some code (and cause problems).

There is no need to use a WeakReference anywhere

Agree...

To get that you must maintian a linkedlist the way linkedhashmap maintains. No other shortcut.

May be... but looks similar to Arrays.sort(), or TreeSet, and etc.... I am trying to avoid this. 'No other shortcut' - may be, but I am unsure.

Thanks!


Fuad Efendi added a comment - 31/Jul/08 05:34 PM
I don't think ConcurrentHashMap will improve performance, and ConcurrentMap is not what SOLR needs:
V putIfAbsent(K key, V value);
    V replace(K key, V value);
    boolean replace(K key, V oldValue, V newValue);

There is also some(...) overhead with oldValue and the state of the hash table at some point; additional memory requirements; etc... can we design something plain-simpler being focused on SOLR specific requirements? Without all functionality of Map etc...


Fuad Efendi added a comment - 01/Aug/08 04:12 PM
Guys at LingPipe (Natural Language Processing) http://alias-i.com/ are using excellent Map implementations with optimistic concurrency strategy:
http://alias-i.com/lingpipe/docs/api/com/aliasi/util/FastCache.html
http://alias-i.com/lingpipe/docs/api/com/aliasi/util/HardFastCache.html

Yonik Seeley added a comment - 21/Feb/09 03:31 PM
closing as duplicate since we have a concurrent cache impl now.