The test originally did this:
read all rows from table B
sleep 1/2 sec
insert a row into table A
read all rows from table A
sleep 1/2 sec
insert a row into table B
The patch changed it so that each thread, after first sleeping for half a second, would wait until the lock table contained at least two locks.
Although this reduced the chance of one thread executing the INSERT statement before the other thread had completed the SELECT statement, it didn't completely plug the hole. I think the problem is that it only checks that the lock table contains two locks, and not that the two locks in fact are row locks on table A and table B. The locks could for example be locks on system tables held while one of the SELECT statements is compiled, in which case the other thread would mistakenly conclude that the other thread was done executing the SELECT statement, and it might go on and execute the INSERT statement too early to get a deadlock.
The attached patch barrier.diff changes the wait logic so that it uses the Barrier class to make sure the other thread has executed far enough that it's safe to go on. It no longer checks the contents of the lock table. Instead, each thread will wait after executing the SELECT statement until the other thread has signaled that it too has completed the SELECT statement.
I repeated Kathey's experiment with a long sleep at the beginning of t2.run(), and the test still passes with the new approach.