Description
I've come across what very much looks like a race condition in log4j1. In reviewing the log4j 2 code I believe the same condition exists.
OutputStreamManager.setOutputStream and OutputStreamManager.write need to have a happens-before edge inserted. You could either make OutputStreamManager.os volatile (best) or make setOutputStream synchronized.
When the RollingFileAppender decides to roll the file it creates a new OutputStream and calls setOutputStream with it. If there is no happens-before edge then that write to OutputStreamManager.os may not be visible to all threads.
Background:
I've been attempting to find a way to have applications write logs to Flume without being halted if Flume is down or its channels fill up. My approach was to use the RollingFileAppender from apache-log4j-extras and configure it to roll every minute. Then I setup a Flume spooling directory source to read those files and forward them on.
I've been having problems with Flume complaining that the rolled log file has changed. The spooling directory source checks this so that people do not attempt to use it on logs that are currently being written to.
I caught an instance of this this afternoon.
File: castellan-reader.20130514T2058.log.COMPLETED
2013-05-14 20:57:05,330 INFO ...
File: castellan-reader.20130514T2058.log
2013-05-14 21:23:05,709 DEBUG ...
Why would an event from 2123 be written into a file from 2058?
My analysis of log4j 1 code is:
My understanding of log4j shows that the RollingFileAppenders end up calling this:
FileAppender:
public synchronized void setFile(String fileName, boolean append, boolean bufferedIO, int bufferSize)
Which shortly calls:
this.qw = new QuietWriter(writer, errorHandler);
However, the code to actually write to the writer is this:
protected
void subAppend(LoggingEvent event) {
this.qw.write(this.layout.format(event));
Unless I'm mistaken there's no happens-before edge between setting the qw and calling subappend. The code path to get to subAppend appears not to go through any method synchronized on FileAppender's monitor. this.qw is not volatile.
Note that I haven't tested log4j 2 for this probable defect - I am raising this work item based on my reading of the code.