Konstantin> No I don't think logSync() method should be modified at least not the way you did in your patch
HADOOP-5189. Synchronization (locking) in the logSync() method prevents different handler threads to write the same information (edits) multiple times into the same edits file. Suppose you have 1 file stream (one edits file) and 2 threads call logSync() at the same time. If you remove locking as you did in your patch then both threads may write the same edits transactions into the same file twice. This will simply corrupt the edits file.
Konstantin> logSync() may be modified to write in parallel into different streams. If you have 2 file streams then it makes sense to write into 2 files in parallel. But the information should be written into each file only once and in the right order. And your patch breaks that, afaiu.
I agree with you that the logSync() in the patch would not work with the file-based logging, and that's why I put in my comments that this patch breaks it, but the purpose of the patch is to work with BookKeeper!
In the prototype included in the patch we do indeed remove synchronisation in logSync(), but at the same time each thread has its own local buffer (a ThreadLocal variable) where it's writing log modifications, and this guarantees that requests are processes in the order they are received by the client, thus they do not violate the semantics of the file system. As a proof I decided to run tests where I check the number of bytes written onto bookkeeper with different logging systems and that number is the same which comforts me on this issue.
Konstantin> My main objection to your approach though, is abstracting the whole EditsLog class rather than just the EditLog*Streams. This leads to replication of a lot of code. Say, both classes EditLog and BKEditLogThreadBuf have the loadFSEdits() methods, which are literally identical, right? We should avoid that at all cost.
For BookKeeper integration I really need to be able to override logSync(), so I have to keep going with the current abstract class unless a different way of overriding that method is provided.
In general, think that the abstract class has exactly the opposite effect of what you state, allowing to place in the root class the common parts, but I agree that I might not have provided you the cleanest code in this sense; I'll do some cleanup and post a new version. In the specific case of loadFSEdits, if they are identical then you could remove the code from BKEditLogThreadBuf.
I also think that the abstract classes for Input/Output streams are cool and they do not conflict with having an abstract class EditLog; in each test prototype in fact we used them
I'll try to have a patch that applies against trunk; and I'd also like to try to address dhruba request of multiple logging systems, maybe I could try with
HADOOP-4539? Is that patch supporting multiple logging through multiple instances of the EditLogInput/OutputStream classes? I'd like to avoid duplicating efforts but, in my case, I really need to be able to Override the current logSync().