Issue Details (XML | Word | Printable)

Key: DIRSERVER-210
Type: Bug Bug
Status: Closed Closed
Resolution: Fixed
Priority: Major Major
Assignee: Emmanuel Lecharny
Reporter: Jacob S. Barrett
Votes: 0
Watchers: 0
Operations

If you were logged in you would be able to see more operations.
Directory ApacheDS

CachingNormalizer exhibits concurrency flaw under load.

Created: 12/Nov/05 07:55 AM   Updated: 21/Apr/07 11:19 AM
Return to search
Component/s: ldap
Affects Version/s: pre-1.0
Fix Version/s: pre-1.0

Time Tracking:
Not Specified

File Attachments:
  Size
Text File Licensed for inclusion in ASF works CachingNormalizer-Concurrency.patch 2005-11-12 07:55 AM Jacob S. Barrett 1 kB

Resolution Date: 22/Nov/05 01:17 AM


 Description  « Hide
CachingNormalizer doesn't have it's cache protected from concurrent modifications. Under load a state is reached where cache.containsKey(key) is true but the result of cache.get(key) results in an unexpected null value and thus a NullPointerException. This is really only reached when you have more than threads than the max cache size and therefore items in the cache are being purged. The attached path synchronizes the cache. It would be better to use a R/W lock from JSR 166 backport or something similar.

 All   Comments   Work Log   Change History   Subversion Commits      Sort Order: Ascending order - Click to sort in descending order
Stephane Bailliez added a comment - 12/Nov/05 09:02 AM
I might be missing something but there's no need to me to do a containsKey() call.
AFAIK you'll never get a null normalized value stored in the cache plus the containsKey() is pretty inefficient as it is iterating over all keys to find if it exists so you'll want to avoid that call.

The Normalizer interface should certainly be documented to make it clear that it does not accept null values to be normalized and that it should not return null.

Emmanuel Lecharny added a comment - 21/Nov/05 07:43 AM
It seems that there is a double flaw in this piece of code :
1) The cache.get( value ) does a cache.containsKey itself. The code should be :

        Object result = cache.get( value ) ;
        
        if ( result != null )
        {
            return result ;
        }
       ...
2) The cache is an instance of LRUMap, which is not synchronized, and which relies on a not synchronized HashMap. This LRUMap must be synchronized using Collection.synchronizedMap(...)

The containsKey method is not iterating over all keys, because the LRUMap is backed by a HashMap.

The Normalizer.normalize() will behave correctly when receiving a null value.



Trustin Lee added a comment - 21/Nov/05 07:39 PM
So do we need to fork Doug Lea's concurrent API, or just depend on it?

Jacob S. Barrett added a comment - 22/Nov/05 01:11 AM
Two things:
1) You can only remove the call to containsKey() if Normalizer.normalize() will never result in a null value. If it can return a null value then you must synchronize the around both calls to get() and containsKey().

2) As I think about it more and look at the implementation of LRUMap the use of a R/W is a bit more complicated. Even on a get() call the internals of LRUMap are modified. To do this with a R/W you will have to internally get a W lock on the LRU list on every get(). Since every read results in a write lock synchronize is sufficient and less complicated for this block.

Emmanuel Lecharny added a comment - 22/Nov/05 01:17 AM
- Some tests have been added to avoid NPE
- the LRUMap is now synchronized
- the cacheNormalizer is now threadsafe.

Emmanuel Lecharny added a comment - 22/Nov/05 01:47 AM
The Normalizers won't return null. If the value is null, the normalizer will not be called, and if the value is empty, the DeepTrimNormalizer (and toLower) have been modified to handle this specific situation, and now return an empty string (before, they returned null)

on the 2nd point, Jacob, I think that the synchronization I added in the LRUMap implementation should protect the server from concurrent access to the lmist. It could be cool to double-check the modification though ! (http://svn.apache.org/viewcvs?rev=345925&view=rev)

Jacob S. Barrett added a comment - 22/Nov/05 02:27 AM
I think I would have left LRUMap alone and just synchronized when it is in use. There might be other places where LRUMap will be used and has no need for synchronization.

With the synchronization you added inside of LRU map there is still a chance for concurrent modification that might result in some "strangeness". Not all modifications are synchronized, see the put() operation where the LRU is not full, the call to super is not synchronized. Also since none of the inherited methods are synchronized it is possible to have one thread modifying the structure while another is reading it. Something like containsKey() could barf when looking at the underlying HashMap if it changes.


Emmanuel Lecharny added a comment - 22/Nov/05 04:22 AM
Well, the LRUMap has been taken from commons.collections. This is a non-synchronized class, and I don't know what were the reasons for not using the native version, so I felt quite not guilty modifying it.

We have two places in the code where we use this class :
- org.apache.ldap.server.partition.impl.btree.jdbm.JdbmIndex
and
- org.apache.ldap.common.schema.CachingNormalizer

I bet that those two class will experience race condition when the server will be overloaded, so this is urgent to fix this issue.

I think that we have three possibilities :
1) transform this LRUMap to a synchronized version of the class (subclassing the LRUMap class);
2) synchronize the access to this class in the two previously names classes (JdbmIndex and CachingNormalizer)
3) rename the class to SynchronizedLRUMap, and correctly implement synchronization.

I personnaly think that the first solution is the best, for many reasons, but mainly because I don't see why we should have our own version of LRUMap when you already have 2 in commons-collection (yes, two : org.apache.commons.collections.LRUMap and org.apache.commons.collections.map.LRUMap :| ), so surcharging one of the two existing seems simple to me. Second, a cache is something that will often used in a multi-threaded environment, so having to check that it is correctly synchronized once is better than to do the work twice.

Of course, this is just MHO.

I have modified the LRUMap in order to fix the put() method (I forgot to synchronized one part of the code), and also changed the protected method to private and add a 'final' keyword to the class.

Those modifications are just done in order to have a working server, but I think we must agree on the best solution before freezing the code. Any idea is welcome !

Emmanuel Lecharny added a comment - 21/Apr/07 11:19 AM
Closing all issues created in 2005 and before which are marked resolved