|
Attaching the patch used to partition the hash tables. It is only
attached for reference, not for inclusion. I have not updated or added comments, and I have not tried very hard to make the names of variables and methods meaningful. If someone wants to implement a new lock manager based on the ideas in this patch, please feel free to do so. FWIW, derbyall ran cleanly with the patch. While I don't know exactly what indications you've seen
of the lock contention in the lock manager, we've seen the exact same thing in our investigations in our "SMP scalability in Derby" project. (We == me and Per Ottar Ribe Pahr). Anyway, we used DTrace with a JDK1.6 beta release to measure monitor contention in a simple embedded derby scenario consisting of 8 threads doing simple "SELECT A FROM FOO WHERE A=?; COMMIT;" loops. FOO is a table consisting of about 100 000 rows of A, an integer and the primary key, and a CHAR(96) FOR BIT DATA filler. The dtrace script measured three things: The classes of objects whose monitors were most contented - in terms of number of contentions and the wait time threads spent waiting for their monitors. In addition, it took a snapshot of the call stack at the point of contention to figure out which method the contention occured in. Here's our measurements, we hope they're useful, if only to confirm that you're onto something. =) PS: You want a fixed-point font to read this. Point your web browser at http://folk.ntnu.no/andersmo/monitor_contention.txt to get a plain old text file. =) TOP BLOCKING OBJECTS: Monitor class Contention count org/apache/derby/impl/services/locks/ActiveLock 2 java/lang/ref/ReferenceQueue$Lock 2 org/apache/derby/impl/services/cache/CachedItem 2 org/apache/derby/impl/services/reflect/ReflectLoaderJava2 3 java/io/PrintStream 5 org/apache/derby/impl/store/raw/data/StoredPage 10 sun/misc/Launcher$AppClassLoader 10 org/apache/derby/impl/sql/GenericPreparedStatement 10 org/apache/derby/impl/store/raw/xact/XactFactory 11 org/apache/derby/impl/store/raw/xact/TransactionTable 29 java/util/Hashtable 34 org/apache/derby/impl/store/raw/data/AllocationCache 36 org/apache/derby/impl/services/cache/Clock 257 org/apache/derby/impl/services/locks/SinglePool 577 org/apache/derby/impl/services/locks/LockSet 7851 TOP BLOCKING OBJECTS BY WAIT TIME: Monitor class Wait time (ms) java/lang/ref/ReferenceQueue$Lock 2 org/apache/derby/impl/services/locks/ActiveLock 2 org/apache/derby/impl/services/cache/CachedItem 6 org/apache/derby/impl/store/raw/data/StoredPage 11 org/apache/derby/impl/services/reflect/ReflectLoaderJava2 13 org/apache/derby/impl/store/raw/data/AllocationCache 110 org/apache/derby/impl/sql/GenericPreparedStatement 120 org/apache/derby/impl/store/raw/xact/TransactionTable 129 java/util/Hashtable 198 org/apache/derby/impl/store/raw/xact/XactFactory 350 sun/misc/Launcher$AppClassLoader 660 java/io/PrintStream 709 org/apache/derby/impl/services/cache/Clock 1380 org/apache/derby/impl/services/locks/SinglePool 3137 org/apache/derby/impl/services/locks/LockSet 16194 TOP BLOCKING METHOD SIGNATURES: org/apache/derby/impl/services/locks/LockSet.lockObject(Ljava/lang/Object;Lorg/apache/derby/iapi/services/locks/Lockable;Ljava/l ang/Object;ILorg/apache/derby/iapi/services/locks/Latch;)Lorg/apache/derby/impl/services/locks/Lock;* 55 org/apache/derby/impl/services/cache/Clock.find(Ljava/lang/Object;)Lorg/apache/derby/iapi/services/cache/Cacheable;* 56 org/apache/derby/impl/services/locks/LockSet.lockObject(Ljava/lang/Object;Lorg/apache/derby/iapi/services/locks/Lockable;Ljava/l ang/Object;ILorg/apache/derby/iapi/services/locks/Latch;)Lorg/apache/derby/impl/services/locks/Lock;* 56 org/apache/derby/impl/services/cache/Clock.release(Lorg/apache/derby/iapi/services/cache/Cacheable;)V* 65 org/apache/derby/impl/services/locks/SinglePool.lockAnObject(Ljava/lang/Object;Ljava/lang/Object;Lorg/apache/derby/iapi/services /locks/Lockable;Ljava/lang/Object;ILorg/apache/derby/iapi/services/locks/Latch;)Lorg/apache/derby/impl/services/locks/Lock;* 79 java/util/Hashtable.get(Ljava/lang/Object;)Ljava/lang/Object;* 84 org/apache/derby/impl/services/locks/SinglePool.unlock(Ljava/lang/Object;Ljava/lang/Object;Lorg/apache/derby/iapi/services/locks /Lockable;Ljava/lang/Object;)I 85 org/apache/derby/impl/services/locks/SinglePool.lockObject(Ljava/lang/Object;Ljava/lang/Object;Lorg/apache/derby/iapi/services/l ocks/Lockable;Ljava/lang/Object;I)Z 103 org/apache/derby/impl/services/locks/LockSpace.unlockGroup(Lorg/apache/derby/impl/services/locks/LockSet;Ljava/lang/Object;)V 110 org/apache/derby/impl/services/locks/LockSet.unlock(Lorg/apache/derby/iapi/services/locks/Latch;I)V* 153 java/util/Hashtable.get(Ljava/lang/Object;)Ljava/lang/Object;* 221 org/apache/derby/impl/services/locks/SinglePool.lockAnObject(Ljava/lang/Object;Ljava/lang/Object;Lorg/apache/derby/iapi/services /locks/Lockable;Ljava/lang/Object;ILorg/apache/derby/iapi/services/locks/Latch;)Lorg/apache/derby/impl/services/locks/Lock; 260 org/apache/derby/impl/services/locks/SinglePool.unlatch(Lorg/apache/derby/iapi/services/locks/Latch;)V 264 org/apache/derby/impl/services/locks/SinglePool.unlock(Ljava/lang/Object;Ljava/lang/Object;Lorg/apache/derby/iapi/services/locks /Lockable;Ljava/lang/Object;)I 330 org/apache/derby/impl/services/locks/SinglePool.latchObject(Ljava/lang/Object;Lorg/apache/derby/iapi/services/locks/Lockable;Lja va/lang/Object;I)Z 459 org/apache/derby/impl/services/locks/LockSet.lockObject(Ljava/lang/Object;Lorg/apache/derby/iapi/services/locks/Lockable;Ljava/l ang/Object;ILorg/apache/derby/iapi/services/locks/Latch;)Lorg/apache/derby/impl/services/locks/Lock;* 559 org/apache/derby/impl/services/locks/LockSet.unlock(Lorg/apache/derby/iapi/services/locks/Latch;I)V* 1714 org/apache/derby/impl/services/locks/LockSet.lockObject(Ljava/lang/Object;Lorg/apache/derby/iapi/services/locks/Lockable;Ljava/l ang/Object;ILorg/apache/derby/iapi/services/locks/Latch;)Lorg/apache/derby/impl/services/locks/Lock;* 2840 Oh, forgot to mention that the tests were run on a SMP machine - an old Sun Enterprise 450 with 4x400MHz CPUs (and a bit error on bit 12 on memory module 1701 - yay ECC. ;-). The Derby release used was 10.1.3.1 - (417277), the official release from the web page. If time permits we'll see about testing Knut Anders' patch with DTrace as well.
I am planning to partition the hash table in LockSet along the lines
of the split-hashtables.diff patch. For the other hash table on which there is monitor contention (the one in SinglePool), I think it is better to remove the hash table entirely. That hash table maps a compatibility space to a LockSpace. Since there is a one-to-one mapping between compatibility spaces and LockSpaces, we could change it so that each compatibility space is a LockSpace object and remove the need for the hash table. This would (a) remove the contention problem, and (b) save some CPU cycles because we don't have to go through the hash table. If this sounds OK, I will create two sub-tasks: one for partitioning the table in LockSet, and one for removing the table in SinglePool. I haven't had time to look at the patch but the original intention was for there to be a multi-thread lock manager
using multiple LockSets, e.g. a MultiPool implementation instead of a SinglePool implementation. Just FYI. With the multiple synchronization how is deadlock detection to be handled? Thanks. Do you think it would be better to leave SinglePool as it is and create a new (perhaps optional) MultiPool implementation instead? I haven't looked very closely at the deadlock detection code yet, but I hope it won't be necessary with too many changes. One possibility is to obtain the synchronization locks on all partitions before the waiters graph is built. Of course, some precautions are needed in order to avoid java-level deadlocks when are multiple synchronization locks possibly obtained in different order. -- Knut Anders In the patch I used when running the tests, monitor contention on LockSet was reduced by having multiple hash tables in each LockSet. I think it might be cleaner to have multiple LockSets, where each LockSet contains one hash table. Will look more into it and report back. In the meantime, I plan to submit some cleanup patches that hopefully will make the final patch smaller and easier to understand regardless of which approach is chosen.
Attached cleanup1.diff.
I observed that LockSet extends Hashtable, but all calls to Hashtable's methods are already synchronized on the LockSet, so it could just as well use a HashMap. The attached patch makes LockSet contain a HashMap instead of extending Hashtable. I ran some tests with the Derbyall and the JUnit tests passed on Solaris 10/Java SE 6. Reviews would be appreciated. cleanup1.diff looks good, but I have a couple of questions:
LockSet has was declared "public final", but is now just "final". Is it intentional? In general, I have a hard time understanding the multithreading-related comments used in Derby (not specific to this patch). For example, the comment for LockSet says: "MT - Mutable - Container Object : Thread Safe". I really don't understand what that means. Is it still true when LockSet no longer inherits from a synchronized container? I think what is meant here is that only those methods declared "synchronized" are MT-safe. A client wanting to use any of the other methods must synchronize on the LockSet object. Correct? The method addWaiters(Dictionary) is not synchronized, and the comment says "MT - must be synchronized on this LockSet object". But in Deadock.getWaiters(LockSet) it is called without synchronization. Presumably this is ok since the old code iterated using an enumeration over an unlocked LockSet, (unless the lock is higher up the stack, and cannot be seen in the diff). But a comment would have been nice :) Thank you for looking at the patch! I'll try to answer your questions below.
1) The change of LockSet from "public final" to "final" was intentional. LockSet is an internal implementation class that is not supposed to be accessed from other packages. 2) I'm not sure what the exact meaning of the MT comment is. I would assume that it meant something like "calls to public (or non-private) methods of this class can be regarded as atomic operations". I'm not sure that this statement is completely true before the patch, but it is definitely not true after the patch, so the comment should be updated. 3) getWaiters() is a private helper method for Deadlock.look() which is only invoked from inside a synchronized block in LockSet, so getWaiters() is in fact always synchronized on the LockSet. This would probably have been clearer if Deadlock.look() had one of those infamous MT comments. :) I'll add one. I have attached v2 of cleanup1. I went through LockSet's methods to determine which of them were not thread safe without extra synchronization and updated their javadoc comments. Changes from the first version:
* MT comment for LockSet class changed to "MT - Mutable - Container Object : All non-private methods of this class are thread safe unless otherwise stated by their javadoc comments." * Added javadoc comments to LockSet.oneMoreWaiter() and LockSet.oneLessWaiter() with information about required synchronization. * Removed what seems to be a false comment in LockSet.anyoneBlocked() ("no synchronization needed because reads of ints are atomic"). It is true that reads of ints are atomic, but synchronization is still needed to avoid that an old cached value is read instead of the real value. * Declared LockSet.anyoneBlocked() as synchronized to avoid the problem mentioned above. Currently, I don't think it is a problem since all calls to anyoneBlocked() come (indirectly) from the unit tests T_AccessFactory and T_LockFactory, but it would be good to fix it anyway. * Added javadoc and MT comment to Deadlock.look(). I have started a new run of derbyall. New version looks good. +1
Thanks for the review! Committed cleanup1.v2 with revision 498999.
Committed cleanup2.diff with revision 499316. The patch removes an unused variable, an unused import and an unused method.
LockSpace extends Hashtable, but no other classes use its Hashtable methods (except one call to isEmpty()), so the Hashtable could just as well be a private member. Since all the non-private methods of LockSpace are synchronized, the Hashtable could be replaced with an unsynchronized HashMap. cleanup3.diff makes these changes. Derbyall and JUnit passed.
The results shown in the attached graphs are not valid for trunk after the commit of the
Since there are two global synchronization points in the lock manager, I'll file subtasks so they can be discussed separately. In slightly related work, we've recently done a little testing with a single-record select load on a trunk patched with
With a separate table and index for each thread (to remove latch contention and lock waits from the equation) we got an overall throughput increase of about 16% compared to a "single table for all threads" run (given a cache large enough to maintain the database in-memory), and found that org.apache.derby.impl.services.cache.Clock.find()/release() caused about 5 times more contention than the synchronization in LockSet.lockObject() and LockSet.unlock(). That might be an indicator of where to apply the next push - and validates the "split hashtable" approach for this workload. =) For our part, we're proceeding to hack at latching. =) Committed cleanup3.diff with revision 507428.
Attaching a tiny cleanup patch (cleanup4). LockControl.popFrontWaiter() is identical to removeWaiter(), only that it calls remove() with the constant 0 instead of using a parameter. The patch makes popFrontWaiter() forward calls to removeWaiter(). The tests passed.
Committed cleanup4.diff with 517588.
cleanup5.diff contains these changes:
ActiveLock.java: removed some unused imports Lock.java: made the class package private LockControl.java: - made the class final and package private - addLock() had an if which tested (!grantLock && ...) at a place where grantLock always would be false. Removed the !grantLock part of the condition. - addWaiter(), removeWaiter() and popFrontWaiter() took a List argument called waiting. Since the argument always was identical to a private member with the same name, I removed "List waiting" from the parameter list. Derbyall and the JUnit tests ran successfully. Attaching a new small cleanup patch (cleanup6) for LockSet.java. LockSet has a private Hashtable (lockTraces) into which it puts a Throwable each time a thread must wait for a lock (only if deadlock tracing is enabled). The values in the hash table are however never used, so the hash table and the code that updates it could be removed. Derbyall and the JUnit tests ran cleanly with the patch.
Committed cleanup5 with revision 518052.
Committed cleanup6 with revision 518073. All sub-tasks completed. Closing the issue.
|
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
synchronization points were eliminated, I split the hash tables in
SinglePool and LockSet into 16 partitions (that is, 16 hash tables),
and used the hash key to decide which partition an object should be
placed in. There was no global synchronization, only synchronization
on the partition.
I have attached graphs for some performance tests with single-record
selects (the entire database was in the page cache). The graphs show
the results on a machine with a single CPU (1cpu.png), one with two
CPUs (2cpu.png) and one with eight CPUs (8cpu.png). They indicate that
there is no significant effect (neither positive nor negative) on the
single-CPU machine, but with multiple CPUs (and multiple threads) the
performance increases when the global synchronization points have been
eliminated.
(Don't pay too much attention to the actual TPS numbers, since the
three machines didn't have identical processors.)