Looking more into this, I think
HBASE-4487 is the real issue. I think I can also prove that you can get the issue even with a disabled LogSyncer.
t1 does appendNoSync of k1
t1 does syncer up to getPendingWrites
t2 does appendNoSync of k2
t2 does syncer up to the end
In the log you'd see k2 then k1 so what's really wrong to me is this:
List<Entry> pending = logSyncerThread.getPendingWrites();
Although accessing pending writes is done in sync, you can apply them in whichever way.
Furthermore, logSyncerThread.hlogFlush can also append entries to the WAL in any order. For example, if both t1 and t2 have multiple edits they could end up intermingled in the WAL simply by doing hlogFlush at the same time.
If LogSyncer was really an issue then HRegion.put and HRegion.delete would need to be disabled too since they don't use appendNoSync and just sync everything
How this used to work is that threads could only append to the WAL under the updateLock and that was done at the same time as the doWrite which creates the key. The call to sync could be done by any number of threads at the same time.
If this is right, then we should pull back
HBASE-4487 or add more locks.
We should also change this Jira's title once we get a better understanding of the problem because it's not a region assignment problem.