I'm having a hard time recreating the jagged counts. I tried reverting patches, and before and after the patch nkeywal provided. I think the flush problem was a red herring where I was biased by the customer problem I was recently working on.
When I changed my tests to do 100000 increments the pattern I saw really jumped out. Looking at the original numbers from this morning I see the same pattern present with the 250000 increments.
80 threads, 250000 increments == 3125 increments / thread.
count = 246875 != 250000 (flush) // one thread failed to start.
count = 243750 != 250000 (kill) // two threads failed to start.
count = 246878 != 250000 (kill -9) // one thread failed to start and we had 3 threads that sent increments that succeeded and retried but didn't get an ack because of kill -9).
The last one through me off because it wasn't regular but I think the explanation I have makes sense.
I'm looking into seeing if my test code is bad (is there TableName documentation I ignoredthat says that the race in the stacktrace is my fault) or if we need to add some synchronization to this createTableNameIfNecessary method.