|
It seems that there is a double flaw in this piece of code :
1) The cache.get( value ) does a cache.containsKey itself. The code should be : Object result = cache.get( value ) ; if ( result != null ) { return result ; } ... 2) The cache is an instance of LRUMap, which is not synchronized, and which relies on a not synchronized HashMap. This LRUMap must be synchronized using Collection.synchronizedMap(...) The containsKey method is not iterating over all keys, because the LRUMap is backed by a HashMap. The Normalizer.normalize() will behave correctly when receiving a null value. So do we need to fork Doug Lea's concurrent API, or just depend on it?
Two things:
1) You can only remove the call to containsKey() if Normalizer.normalize() will never result in a null value. If it can return a null value then you must synchronize the around both calls to get() and containsKey(). 2) As I think about it more and look at the implementation of LRUMap the use of a R/W is a bit more complicated. Even on a get() call the internals of LRUMap are modified. To do this with a R/W you will have to internally get a W lock on the LRU list on every get(). Since every read results in a write lock synchronize is sufficient and less complicated for this block. - Some tests have been added to avoid NPE
- the LRUMap is now synchronized - the cacheNormalizer is now threadsafe. The Normalizers won't return null. If the value is null, the normalizer will not be called, and if the value is empty, the DeepTrimNormalizer (and toLower) have been modified to handle this specific situation, and now return an empty string (before, they returned null)
on the 2nd point, Jacob, I think that the synchronization I added in the LRUMap implementation should protect the server from concurrent access to the lmist. It could be cool to double-check the modification though ! (http://svn.apache.org/viewcvs?rev=345925&view=rev) I think I would have left LRUMap alone and just synchronized when it is in use. There might be other places where LRUMap will be used and has no need for synchronization.
With the synchronization you added inside of LRU map there is still a chance for concurrent modification that might result in some "strangeness". Not all modifications are synchronized, see the put() operation where the LRU is not full, the call to super is not synchronized. Also since none of the inherited methods are synchronized it is possible to have one thread modifying the structure while another is reading it. Something like containsKey() could barf when looking at the underlying HashMap if it changes. Well, the LRUMap has been taken from commons.collections. This is a non-synchronized class, and I don't know what were the reasons for not using the native version, so I felt quite not guilty modifying it.
We have two places in the code where we use this class : - org.apache.ldap.server.partition.impl.btree.jdbm.JdbmIndex and - org.apache.ldap.common.schema.CachingNormalizer I bet that those two class will experience race condition when the server will be overloaded, so this is urgent to fix this issue. I think that we have three possibilities : 1) transform this LRUMap to a synchronized version of the class (subclassing the LRUMap class); 2) synchronize the access to this class in the two previously names classes (JdbmIndex and CachingNormalizer) 3) rename the class to SynchronizedLRUMap, and correctly implement synchronization. I personnaly think that the first solution is the best, for many reasons, but mainly because I don't see why we should have our own version of LRUMap when you already have 2 in commons-collection (yes, two : org.apache.commons.collections.LRUMap and org.apache.commons.collections.map.LRUMap :| ), so surcharging one of the two existing seems simple to me. Second, a cache is something that will often used in a multi-threaded environment, so having to check that it is correctly synchronized once is better than to do the work twice. Of course, this is just MHO. I have modified the LRUMap in order to fix the put() method (I forgot to synchronized one part of the code), and also changed the protected method to private and add a 'final' keyword to the class. Those modifications are just done in order to have a working server, but I think we must agree on the best solution before freezing the code. Any idea is welcome ! Closing all issues created in 2005 and before which are marked resolved
|
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
AFAIK you'll never get a null normalized value stored in the cache plus the containsKey() is pretty inefficient as it is iterating over all keys to find if it exists so you'll want to avoid that call.
The Normalizer interface should certainly be documented to make it clear that it does not accept null values to be normalized and that it should not return null.