Bug 43867 - NOPLoggerRepository error during shutdown
Summary: NOPLoggerRepository error during shutdown
Status: RESOLVED FIXED
Alias: None
Product: Log4j - Now in Jira
Classification: Unclassified
Component: Other (show other bugs)
Version: 1.2
Hardware: Other other
: P2 normal
Target Milestone: ---
Assignee: log4j-dev
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2007-11-14 15:04 UTC by Marius Scurtescu
Modified: 2014-02-17 13:40 UTC (History)
5 users (show)



Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Marius Scurtescu 2007-11-14 15:04:27 UTC
After upgrading from 1.2.14 to 1.2.15 the following error show up during shutdown:
log4j:ERROR LogMananger.repositorySelector was null likely due to error in class
reloading, using NOPLoggerRepository.

No errors at all with 1.2.14

This probably is related to bug 40212
Comment 1 Curt Arnold 2007-11-14 22:03:17 UTC
The error message isn't the bug, it is a symptom of something else.  It is a notification that somehow that 
class invarients are being broken (likely by class reloading) and log4j is going to avoid throwing a NPE by 
substituting a minimal non-functional logger repository.  If this is happening at shutdown, then you have 
an annoying message but at least you didn't crash.  Not much log4j can do if a class loader is starting to 
mess with invariants other than to minimize the damage.  I'd expect that 1.2.14 would raise 
NullPointerException under the same conditions that 1.2.15 would output a warning, but doesn't suprise 
me that things are a little inconsistent.  If the Tomcat community can either fix the classloader or give us 
specific instructions how to avoid their issue, we could have the problem go away, but there isn't much 
log4j can do to avoid the issue. 
Comment 2 Marius Scurtescu 2007-11-14 22:25:19 UTC
Since 1.2.14 does not throw a NPE (or maybe it does, but it is never logged?), I
assumed that 1.2.15 should not log an error.

You are saying that 1.2.15 outputs a warning, it actually outputs an error. A
warning would be much preferred.
Comment 3 Aaron Digulla 2007-11-15 02:27:55 UTC
Apparently, the Tomcat classloader sets all static fields to null when you
remove a webapp. Bugs like this one here indicate that something is still alive
after the webapp has been removed (maybe a background thread).
Comment 4 Aaron Digulla 2007-11-15 02:28:24 UTC
Apparently, the Tomcat classloader sets all static fields to null when you
remove a webapp. Bugs like this one here indicate that something is still alive
after the webapp has been removed (maybe a background thread).
Comment 5 Marius Scurtescu 2007-11-15 11:41:38 UTC
(In reply to comment #3)
> Apparently, the Tomcat classloader sets all static fields to null when you
> remove a webapp. Bugs like this one here indicate that something is still alive
> after the webapp has been removed (maybe a background thread).

Yes, I am aware of that. I mentioned that this is related to bug 40212.

The issue is that with 1.2.14 there are not errors (no NPE) and with 1.2.15
there are errors. Same Tomcat and exact same webapps.

So problem seems to be with Log4j, introduced in 1.2.15.
Comment 6 Aaron Digulla 2007-11-21 01:42:12 UTC
For me, your argument means that your webapp had a bug which wasn't triggered
with 1.2.14 but since we enhanced the code to report more erroneous states in
1.2.15, it now shows up.
Comment 7 Dan Armbrust 2008-01-25 10:15:24 UTC
For me, the argument goes like this:

Log4j has errors when a tomcat container is shutdown which contains log4j.  

These errors used to be silent - so no issue.

Now, in 1.2.15, they are being logged - which is a problem for me as well.

Suddenly, my customers are calling the help desk and asking what is this new error?

Log4J needs to be able to be shutdown without logging (and preferably, without
throwing) errors like this.

If someone can tell me what in my webapp would cause this, I'm happy to fix it.
 But this appears to be a tomcat / log4j bug, from what I have learned so far.
Comment 8 Curt Arnold 2008-01-25 11:49:25 UTC
The underlying problem has to be resolved in Tomcat.  If the Tomcat team would suggest modifications 
that would make log4j immune to their classloader mucking with class invariants and leaving the class 
in limbo, it would likely be too drastic to consider to do this late in log4j 1.2's life.

So that leaves the question of what log4j should do in the circumstance that it is left in a corrupt state.  
Prior to log4j 1.2.15, log4j would throw a NPE when an attempt was made to log after the Tomcat 
class-loader did its damage.  In log4j 1.2.15, if that situation occurs log4j emits an error message 
which is definitely an improvement over a NPE.

You have observed that log4j 1.2.15 appears to be more likely to get in that state than log4j 1.2.14 
since you never saw the NPE with earlier versions, but now see the error message.  That likely is not 
directly related to the presence of the check and error message.  Removing the check for the null 
repository selector may or may not change the frequency of log4j being in a corrupt state, but it would 
cause the NPE's to return when it got in that state.

It is unreasonable to expect log4j to perform perfectly when the class loader has put it in a state that is 
not reachable by ordinary means.

The options are:

1. Fix the tomcat class loader or provide instructions to avoid the scenario.  That would have to be 
done by the Tomcat project.
2.  Identify characteristics that make a library more likely to be corrupted.  If the Tomcat project could 
identify that, we may (or may not) be able to adjust log4j to reduce the likelihood of getting in the 
corrupt state.
3. Remove the check for null repo which would cause NPE's to return.
4. Change the error to a warning message.
5. Change the error to a debug message (which would generally not be displayed). 
6. Change the message so it is not as frightening.

Comment 9 Dan Armbrust 2008-01-25 12:01:33 UTC
The strange part to me in the sequence of events is that I have never had an
issue with the Null Pointers - I never saw one previously.  But I also haven't
followed the other bug very closely which led to this change.

Unless we can get some help from the tomcat folks to figure out how to prevent
this from happening - I would vote for option 5.

It doesn't seem unreasonable to me for someone to turn on full debug error
logging if they are trying to figure out where their log messages went if they
end up getting switched over to the NOPLoggerRepository for some other reason.
Comment 10 Mark Thomas 2008-01-25 15:32:16 UTC
For the record, you can disable the Tomcat reference clearing code by setting
the following system property to false:
org.apache.catalina.loader.WebappClassLoader.ENABLE_CLEAR_REFERENCES

For a properly written web app, this feature should not be required.
Comment 11 Dan Armbrust 2008-01-25 15:49:27 UTC
Far more info here, from the Tomcat side:

http://issues.apache.org/bugzilla/show_bug.cgi?id=41939
Comment 12 Gili 2008-04-22 23:17:28 UTC
The webapps I personally wrote do not do any class reloading as far as I know. It could very well be that one of the other webapps I run on my server (or one of the frameworks they use) does so but that's totally outside of my control/scope.

From an end-user point of view, this bug is a show-stopper for me. Having to manually kill Tomcat every single time is a major problem and for this reason I am downgrading to an older release of log4j. I humbly suggest you guys revisit this issue as soon as possible and resolve it better. Throwing an exception with an message that amounts to "you did something wrong" helps no one. Either provide us with a meaningful exception to track the problem back to the source or ensure log4j is more resilient, able to recover gracefully from such problems.
Comment 13 Aaron Digulla 2008-05-04 11:22:50 UTC
Re #12: As a rule of thumb, here are a few tips what you should look for:

1. Threads which aren't completed when your webapp is stopped. I'm talking about threads you (or libraries which you use) start. Do you use a timer, quartz or something like that?

2. Find out why the code which throws the error thinks it has to run after your webapp has been removed or reloaded.
Comment 14 Thorbjørn Ravn Andersen 2008-07-03 13:40:25 UTC
From what I understand from the comments this is basically a Tomcat issue, and will not be circumvented in log4j 1.2?  Would it be reasonable to resolve with a WONTFIX then?
Comment 15 Gili 2008-07-03 18:02:39 UTC
I can pretty much guarantee you that the Tomcat team will never make any changes to fix this issue. It is pretty much impossible getting them to do anything nowadays. In my experience 90% of valid bug reports filed get closed as WON'T FIX or INVALID by Remy. Other people have reported the same.
Comment 16 Thorbjørn Ravn Andersen 2008-07-04 02:56:19 UTC
(In reply to comment #15)
> I can pretty much guarantee you that the Tomcat team will never make any
> changes to fix this issue. It is pretty much impossible getting them to do
> anything nowadays. In my experience 90% of valid bug reports filed get closed
> as WON'T FIX or INVALID by Remy. Other people have reported the same.

Interesting.  Could you provide the link to this issue in the Tomcat Bugzilla, please?

Comment 17 Gili 2008-07-04 03:05:58 UTC
It seems I might have been wrong in this instance. The only related bug I could find was https://issues.apache.org/bugzilla/show_bug.cgi?id=41939 but it is already marked as FIXED.

Please try opening a bug report with Tomcat if you feel there are any unresolved issues but please do not close this bug report (on the log4j end) until you get an initial reply from the Tomcat team.
Comment 18 Thorbjørn Ravn Andersen 2008-07-04 03:18:18 UTC
(In reply to comment #17)
> It seems I might have been wrong in this instance. The only related bug I could
> find was https://issues.apache.org/bugzilla/show_bug.cgi?id=41939 but it is
> already marked as FIXED.
> 
> Please try opening a bug report with Tomcat if you feel there are any
> unresolved issues but please do not close this bug report (on the log4j end)
> until you get an initial reply from the Tomcat team.

The original poster did not include a testcase, so it is impossible to determine now if the problem is still there.  

Until then I suggest this issue is put to NEEDINFO.

Comment 19 Dan Armbrust 2008-07-07 07:48:18 UTC
I don't know what more info you possibly need.  This entire problem is well documented already.  See Comment #8.  Comment #11, and Comment #10 (from a Tomcat guy)

The quick summary is that log4j started logging more errors than it used to from 1.2.14 to 1.2.15.  

Now, when it is deployed in a webapp in tomcat which is stopped, it logs ugly null pointer exceptions - as errors - which end up on the tomcat console.  

The real problem is caused by how tomcat does its shutdown process, which may or may not be fixed at present.  The referenced Tomcat bug seems to indicate that it now may be fixed, without the workaround presented in comment #10.

But to fix the log4j issue, and keep ugly errors from happening, what I think should happen is that Suggestion 5 from Comment #8 should be implemented in log4j.

Comment 20 Gili 2008-07-07 09:39:52 UTC
If this is logged as DEBUG I am concerned users will see Tomcat hang without knowing what is going on. At least now you see some sort of error message. I haven't had the chance to try out this bug with recent Tomcat builds but I plan on doing so in the coming weeks.
Comment 21 Dan Armbrust 2008-07-07 09:57:51 UTC
When I saw the issue, it didn't result in any hang.  It simply caused ugly errors to end up on the tomcat console while the app and/or tomcat was shutting down.

Comment 22 Gili 2008-07-07 10:02:24 UTC
When running under Windows with Tomcat running as a service I routinely got hangs when I would invoke "net stop tomcat6" and the log4j message was the last error message printed to the logs. Furthermore, these problems only started once I upgraded my log4j version.

Just to clarify, "net stop tomcat6" would fail regardless of how many times you'd try and "net start tomcat6" would complain Tomcat was already running.

If this issue has indeed been resolved then maybe the severity could be reduced from ERROR to WARN but I definitely would not consider it a DEBUG. The message is warning you about something that needs to be fixed, is it not?
Comment 23 Aaron Digulla 2008-07-07 11:11:35 UTC
Guys, for the last time: This is neither a bug in log4j nor in Tomcat!

It's *always* *always* *always* *always* a bug in the deployed webapp.

Yeah, you didn't see it in 1.2.14 but that doesn't the bug wasn't there: You just didn't *see* it.

The root cause of the bug is that code of your webapp is executed *after* undeploy. The usual cause of the problem is a) a shutdown hook or b) a thread which your app starts and which isn't terminated during undeploy.

The usual culprits: Starting your own threads without terminating them on undeploy, using a timer framework like Quartz, ActiveMQ (which you can't stop cleanly after having started it), things like that.

Activate the debug switched for Tomcat, use your IDE to connect and watch the threads. That should give you a very good idea where the bug comes from.

So IMHO, this is bug should be closed as INVALID.
Comment 24 QualityChecker 2008-07-07 13:12:14 UTC
If of any help, look there : http://d.cr.free.fr/wswebconsulterfichiers.php?projet=demojava_log4j

You'll find a list of null references, overflow, dead code bugs for a large set of files from org.apache.log4j

May be the key of this bug is to be found there.

Good luck.
Comment 25 Dan Armbrust 2008-07-07 13:49:53 UTC
I'm sorry Aaron, but to borrow your pattern, I'm quite certain that this bug is *not* *not* *not* *not* a bug in the deployed webapp - or anything to do with users threads.

This is purely a problem between tomcat and log4j.  

By putting one of my apps into a debugger, I have concluded the following:

The error message in question

"log4j:ERROR LogMananger.repositorySelector was null likely due to error in class reloading, using NOPLoggerRepository." 

is printed by the Main thread in the JVM.  

I know this because it happens when I have paused every other thread in the JVM via the debugger.  So, lets just throw out this mystery threading problem conclusion.

Furthermore, this error message is printed within the class:

org.apache.catalina.loader.WebappClassLoader.clearReferences() - between the lines of 1596 and 1640 (in tomcat version 5.5.25) as it is being executed by tomcats main thread during a tomcat shutdown.

Even if this was a mystery threading problem (which is isn't) your answer isn't very helpful - in that case, the issue would be that the message being logged sucks - and it should be improved to at least give a stack trace, or a thread name, or something that could be used to actually track down the problem.
Comment 26 Thorbjørn Ravn Andersen 2008-07-07 14:17:23 UTC
(In reply to comment #24)
> If of any help, look there :
> http://d.cr.free.fr/wswebconsulterfichiers.php?projet=demojava_log4j
> 
> You'll find a list of null references, overflow, dead code bugs for a large set
> of files from org.apache.log4j

Could you please provide an English version of these pages.  I do not understand French.


> May be the key of this bug is to be found there.
> 
> Good luck.

Less luck might be needed if you gave a hint what we are to look for and what the error links mean.

Thanks for helping!

Comment 27 Dan Armbrust 2008-07-07 14:19:53 UTC
Since I'm well into this bug now, I figured I'd go a bit further.

To complicate matters even more, there is a third project involved in this problem - struts.

The class which is being "mucked with" by tomcat at the time the error happens is:

org.apache.struts.validator.ValidatorForm

The field name is "log" - the type type of the field is "org.apache.commons.logging.Log"

Tomcat calls field.set(null, null); on this field.

This leads into java.lang.reflect.Field until the following line of code:

reflectionFactory.newFieldAccessor(this, overrideFinalCheck);

The error then occurs somewhere down inside of the class 

sun.reflect.ReflectionFactory


I have no idea what the ReflectionFactory class is trying to do when it triggers this error - I don't have the source for it right now.

So, if you are trying to reproduce this error, you will need to have some code  in your webapp which loads the struts ValidatorForm class.



Comment 28 Thorbjørn Ravn Andersen 2008-07-07 14:26:55 UTC
(In reply to comment #25)

> This is purely a problem between tomcat and log4j.  
> 
> By putting one of my apps into a debugger, I have concluded the following:

Please create a minimal wep app which demonstrate this behaviour, and attach it to this issue.  This will allow us all to see what you see, plus the person who will write the patch (which might be you) can demonstrate quite clearly to the person who will review the patch that it fixes the issue.

This is with the latest Tomcat 6?
Comment 29 Gili 2008-07-07 14:36:16 UTC
BTW: This issue might be triggered by a bug in a webapp, but there is certainly a severe log4j-specific bug at play too. I have a webapp which hasn't changed since 2005. When I upgraded log4j all of a sudden I noticed that Tomcat started hanging on shutdown. Log4j or Tomcat should not be hanging, even on so-called misbehaving webapps (and I'm not sure my webapp is even misbehaving!)
Comment 30 Dan Armbrust 2008-07-07 14:44:57 UTC
Gili - I'm not seeing the hang myself - maybe it's related, maybe not.  Can you put your webapp in a debugger and see which thread is still running?

Usually a tomcat instance that won't stop is caused by a non-daemon thread.

If your not set up for a debugger, try this: http://www.unixville.com/~moazam/stories/2004/05/18/debuggingHangsInTheJvm.html
Comment 31 Dan Armbrust 2008-07-07 15:45:40 UTC
I'm using Tomcat 5.5.25.

So far, I haven't been able to create a simple webapp which demonstrated the problem.  There is some magic combination of tomcat, log4j, struts? (something else?) that triggers this.  

I can't release my  code which always triggers this.  I'll keep poking at it, when I have free time - trying to figure out what is different between the simpla webapp I just created, and my other apps.  I would really like to know what the magic combination is that causes it to happen.

If someone else can come up with an example app, that would be great. 
Comment 32 Thorbjørn Ravn Andersen 2008-07-07 16:02:53 UTC
(In reply to comment #29)
> BTW: This issue might be triggered by a bug in a webapp, but there is certainly
> a severe log4j-specific bug at play too. I have a webapp which hasn't changed
> since 2005. When I upgraded log4j all of a sudden I noticed that Tomcat started
> hanging on shutdown. Log4j or Tomcat should not be hanging, even on so-called
> misbehaving webapps (and I'm not sure my webapp is even misbehaving!)

Please reduce your web app as much as you can while preserving the faulty behaviour and attach it to this issue for the same reasons as given in comment #28.

Thanks.
Comment 33 Aaron Digulla 2008-07-08 02:42:20 UTC
Re comment 25:

Dan,

Code from deployed webapps is never executed in the main thread. Tomcat creates worker threads for that. So what your app is doing is the second thing I mentioned in comment 23: It's adding code to some hook (VM shutdown hooks, for example).

org.apache.catalina.loader.WebappClassLoader.clearReferences() triggers the bug by overwriting all static variables in your webapp (classes and JARs) with null. It's not the place where you made the mistake; it's just the place where you notice you made it.

All your other questions have been answered before, too, like the reason why log4j can't give you a useful stack trace. To be able to do that, log4j would need to intercept all writes to static variables and remember the stack traces when they happen. I know of no VM which allows that in an efficient manner.
Comment 34 Dan Armbrust 2008-07-08 09:21:44 UTC
(In reply to comment #33)
> Re comment 25:
> 
> Dan,
> 
> Code from deployed webapps is never executed in the main thread. Tomcat creates
> worker threads for that. So what your app is doing is the second thing I
> mentioned in comment 23: It's adding code to some hook (VM shutdown hooks, for
> example).
> 

Aaron, 

I'm sorry, but unless you can show otherwise in a debugger, you are just making incorrect assumptions.  My debugger shows me that the Tomcat shutdown process is not spawned to other threads - it is in fact, done by the main thread.  That is where the clearReferences() call is made. 

Now, I do agree, that the error itself is probably being printed from a finalizer or a shutdown hook which is triggered by the clearReferences call - and since that is a new and short lived thread, I don't think my debugger is catching it.

The reason that this hits and confuses so many web developers is because I don't think they made the mistake - at least not directly.  My suspicion now is strongly  pointing at common 3rd party code - likely struts, or one of its dependencies that many people use.  But so far, I can't find any code my included libraries that are setting up shutdown hooks.

Also, I still don't understand why log4j can't print a stack trace with this error - we don't want the stack trace of the thread that set the variable to null, we want the stack trace of the the thread that tried to use the null variable.  Why wouldn't that be available?  Because it seems that the only way to track down and close this issue is to find the class with the finalizer.  And unless I can figure out how to make my debugger pause newly spawned threads, I don't know how else to find it.  I thought I was on to something with the struts ValidationForm class - since that is the class that triggers the problem in my webapp - but in looking through that code, I can't find anything suspicious.
Comment 35 Aaron Digulla 2008-07-09 00:13:43 UTC
(In reply to comment #34)

Please go through your own argument once more as someone who isn't stuck in it. Try to find answers to these questions:

1. Why is a logger method being called in clearReferences()? Tomcat doesn't use log4j, it uses java.util.Logging. So your webapp must somehow put code somewhere which Tomcat executes during shutdown.

2. When log4j notices that its static variables have been cleared, what should it print? The current stack trace would give no hint who cleared the variable because this probably has happened a long time ago, probably by some other thread. If it would print the stack trace the moment the variable is being cleared, you would see that you're inside the Tomcat cleanup code because your webapp is being undeployed. No surprise here and no information either. IIRC, there is another reason why the stack trace doesn't help but I can't recall it. You'll see it immediately when you try to debug this.

So you must find out: a) why is a logger called and b) who registered this piece of code which is executed after your webapp is supposed to be long gone?

Again: In your case, there is no need for another thread. All this can happen in the main thread and there are more ways to log during app exit than the shutdown hooks. Just put a breakpoint in the place where log4j prints the error message and see who is trying to log.

But it's a bug in your webapp (and I'm counting all the JARs you include as "your" here :) and not in log4j.
Comment 36 Dan Armbrust 2008-07-09 09:37:54 UTC
> 2. When log4j notices that its static variables have been cleared, what should
> it print? The current stack trace would give no hint who cleared the variable
> because this probably has happened a long time ago, probably by some other
> thread. 

Actually, yes, this is exactly the stack trace I want.  I don't care what thread cleared the variable - I already know that - its the main thread.  I want to know whet thread is trying to use the now null variable.  It's not a thread I started (in my known code), its none of the running threads the moment before the error happens - so - its a newly spawned thread - most likely a finalizer thread from some third party library I'm using.  I would like to think that that stack trace would give me a good clue as to what the finalizer is attempting to do - so maybe I could trace it back to where it was actually registered.  At least it would tell me which library to look in.

> So you must find out: a) why is a logger called and b) who registered this
> piece of code which is executed after your webapp is supposed to be long gone?

That's what i've been trying to figure out the whole time.  But log4j is not being helpful at all in tracking it down, since it won't give me any info on the thread which is hitting the null variable and I have no clue which one of the 50 jar files in my system is responsible for registering that code.  If log4j is going to print an error message that is useless in actually tracking down the error, why even bother?  99.9% of the developers won't care anyway, since the system is shutting down.  So I think this message should either be improved - or set back to info, or some such level, where it won't show up.  

>Just put a breakpoint in the place where log4j prints the error
> message and see who is trying to log.

Do you know where that is?  Because I don't, offhand.  

> 
> But it's a bug in your webapp (and I'm counting all the JARs you include as
> "your" here :) and not in log4j.
> 

I agree with that - but its a bug I don't care about.  Yet - log4j is insisting that I should care about it.  But I don't.  Why isn't there a way for me to tell log4j that I don't care about this problem - stop printing unnecessary error messages like this?
Comment 37 Thorbjørn Ravn Andersen 2008-07-09 10:48:43 UTC
(In reply to comment #36)

> >Just put a breakpoint in the place where log4j prints the error
> > message and see who is trying to log.
> 
> Do you know where that is?  Because I don't, offhand.  

The source files are present in the distribution archive.

In Eclipse I add the log4j-1.2.15.jar file to the build path, and in the properties of the jar file I set the source archive as the "Java Source Attachment".

I can then press Ctrl-Alt-T, select NOPLoggerRepository and press Return to go to the NOPLoggerRepository.class file but SEE the NOPLoggerRepository.java file in my editor.  I can then position the cursor over the "NOPLoggerRepository" in "public class NOPLoggerRepository {" and press Ctrl-Alt-G.  I am then presented with the locations where the NOPLoggerRepository class is referenced. 

The method you are looking for is LogManager.getLoggerRepository, and the error message is logged in line 177.

 > that I should care about it.  But I don't.  Why isn't there a way for me to
> tell log4j that I don't care about this problem - stop printing unnecessary
> error messages like this?

Perhaps you would like to raise this as an enhancement issue then? 

Comment 38 Aaron Digulla 2008-07-09 11:20:21 UTC
> Why isn't there a way for me to tell log4j that I don't care about
> this problem - stop printing unnecessary error messages like this?

Because your webapp will cause havoc when you install it in a shared app server. Undeploying your webapp can kill the whole server.

Also, this is a common mistake which most people never realize. They notice that "Tomcat is unstable after a while" and things like that but only error messages like this one can point you at the source of the fault. I understand that it's a pain to track these down; in my case, I search for two solid days until I found a really horrible bug in my webapp that needed fixing.

If you have any ideas how to make them easier to figure out, I'm all for it. But don't disable them. That's like "Oh, I left my loaded gun in the room of my little brother. Well, I guess he won't play with it".
Comment 39 Dan Armbrust 2008-07-09 13:30:27 UTC
Ok.  First - how to make this easier for others to find:

Before closing this bug, I suggest a change along the following lines.

In LogManager, line 177, the following line:

LogLog.error("LogMananger.repositorySelector was null likely due to error in class reloading, using NOPLoggerRepository.");


Could be changed to something like this:

LogLog.error("LogMananger.repositorySelector was null likely due to error in class reloading, using NOPLoggerRepository.", new Exception("Stack of offending thread").fillInStackTrace());

Which would be vastly more helpful if you insist on leaving this message in place.  However, read-on....



On to the real issue - and this is interesting - even more of our assumptions are wrong.  This has nothing to do with finalizer code.  And in fact, has nothing to do with any code that I had deployed in my webapp.  The problem happens purely within the Main thread, being executed by tomcat.

It happens because of a basic design issue between tomcat and log4j - and if you ask me the error is purely in tomcat.

So, previously, I had traced the main thread all the way down to 

reflectionFactory.newFieldAccessor(this, overrideFinalCheck);

as Tomcat was setting things to null - then I couldn't follow any farther, due to a lack of source code.

Working backwards from log4j, here is the offending stack:

LogManager.getLoggerRepository() line: 177	
LogManager.getLogger(String) line: 199	
Logger.getLogger(String) line: 105	
Log4JLogger.getLogger() line: 229	
Log4JLogger.<init>(String) line: 65	
GeneratedConstructorAccessor5.newInstance(Object[]) line: not available	
DelegatingConstructorAccessorImpl.newInstance(Object[]) line: 27	
Constructor<T>.newInstance(Object...) line: 513	
LogFactoryImpl.newInstance(String) line: 529	
LogFactoryImpl.getInstance(String) line: 235	
LogFactoryImpl.getInstance(Class) line: 209	
LogFactory.getLog(Class) line: 351	
ValidatorForm.<clinit>() line: 99	
Unsafe.ensureClassInitialized(Class) line: not available [native method]	
UnsafeFieldAccessorFactory.newFieldAccessor(Field, boolean) line: 25	
ReflectionFactory.newFieldAccessor(Field, boolean) line: 122	


The only code here that isn't from log4j or sun is:

ValidatorForm.<clinit>() line: 99

That particular class is org.apache.struts.validator.ValidatorForm from struts.

What is happening on line 99 is:

private static Log log = LogFactory.getLog(ValidatorForm.class);

Which is certainly not unusual - I don't think struts has anything to do with causing the problem.  It's more of just luck - depending on the order that tomcat runs through the classes as it clears references.

Tomcat, in the desire to clear out the static fields within a class by using reflection, has unwittingly caused the class init code to be rerun due to the way that reflection works.  When it hits this code that tries to use the log factory, it doesn't know that it has already broken the logfactory by setting fields within the logfactory to null.

So, the logfactory prints the error.

Short of making tomcat stop clearing references - I don't think there is any way for me as a web developer to prevent this error.

Therefore, with that argument, I think that we need a way to make log4j not log this as an error.  I think this particular message needs to go back to being a info or a debug level message.  

And I'm going to re-raise this issue on the tomcat bug tracker, but I doubt that they will make any changes, since there is really no way for them to know what a safe order is for clearing references - and they still insist on clearing references by default.
Comment 40 Gili 2008-07-09 13:39:58 UTC
Dan,

Great detective work! Can you please clarify some details further before we discuss what needs to be done? Specifically, I am wondering about this paragraph:

"Tomcat, in the desire to clear out the static fields within a class by using
reflection, has unwittingly caused the class init code to be rerun due to the
way that reflection works."

What does "clearing out static fields" amount to? Setting them or null or something else? Can you please give a short verbal explanation of how the problem occurs (i.e. a scenario involves 1-2 classes)?
Comment 41 Dan Armbrust 2008-07-09 14:25:02 UTC
(In reply to comment #40)

> 
> What does "clearing out static fields" amount to? Setting them or null or
> something else? Can you please give a short verbal explanation of how the
> problem occurs (i.e. a scenario involves 1-2 classes)?
> 

Gili, take a look at this code:

http://svn.apache.org/repos/asf/tomcat/container/tc5.5.x/catalina/src/share/org/apache/catalina/loader/WebappClassLoader.java

Do a search for "if (ENABLE_CLEAR_REFERENCES) {"


That 'if' block is what tomcat is doing as it tries to clear references, and inadvertently causes this error.

Comment 42 Thorbjørn Ravn Andersen 2008-07-15 04:27:40 UTC
I have now seen this message in something I wrote a while back.  I will see if I can create a test case when I have better time.  This will also allow us to prove who the sinner is.
Comment 43 Thorbjørn Ravn Andersen 2008-07-25 03:25:51 UTC
(In reply to comment #0)
> After upgrading from 1.2.14 to 1.2.15 the following error show up during
> shutdown:
> log4j:ERROR LogMananger.repositorySelector was null likely due to error in
> class
> reloading, using NOPLoggerRepository.

I suggest that the wording of this message (LogManager.java, line 177) is changed to something like.

"log4j repository corrupted.  Happens with some class loaders.  Logging disabled.  See https://issues.apache.org/bugzilla/show_bug.cgi?id=43867 for discussion"

This is to help the person seeing this message as much as possible resolving the problem.  The current message wording does not reflect the current consensus namely that this is essentially not a log4j problem.

I would originally suggest adding an extra argument giving the requested stack trace, but thought better of it as it misleads more than it helps.

Comment 44 Curt Arnold 2008-08-05 11:07:12 UTC
I don't like "repository is corrupted" which to me suggests some subtle distortion of the system than a total catastrophic failure.

I think the best solution is to keep the message text, but add a link to a FAQ (which could link to the bugs for background info).
Comment 45 Curt Arnold 2008-08-05 12:33:15 UTC
I think that we could suppress the message when it occurs within the scope of org.apache.catalina.loader.WebappClassLoader.clearReferences.  Hate to put a client specific hack in our code, but it seems to affect enough of our users to warrant that ugliness.

Does the stack trace extend all the way back to WebappClassLoader?

I may take a shot at detecting that scenario and commit to the SVN, but would depend on somebody who can reproduce the issue to see if it addresses the issue.  
Comment 46 Dan Armbrust 2008-08-05 13:01:24 UTC
Here is a full stack:

Thread [main] (Suspended (breakpoint at line 177 in LogManager))	
	LogManager.getLoggerRepository() line: 177	
	LogManager.getLogger(String) line: 199	
	Logger.getLogger(String) line: 105	
	Log4JLogger.getLogger() line: 229	
	Log4JLogger.<init>(String) line: 65	
	GeneratedConstructorAccessor5.newInstance(Object[]) line: not available	
	DelegatingConstructorAccessorImpl.newInstance(Object[]) line: 27	
	Constructor<T>.newInstance(Object...) line: 513	
	LogFactoryImpl.newInstance(String) line: 529	
	LogFactoryImpl.getInstance(String) line: 235	
	LogFactoryImpl.getInstance(Class) line: 209	
	LogFactory.getLog(Class) line: 351	
	UIAjaxSupport.<clinit>() line: 57	
	Unsafe.ensureClassInitialized(Class) line: not available [native method]	
	UnsafeFieldAccessorFactory.newFieldAccessor(Field, boolean) line: 25	
	ReflectionFactory.newFieldAccessor(Field, boolean) line: 122	
	Field.acquireFieldAccessor(boolean) line: 918	
	Field.getFieldAccessor(Object) line: 899	
	Field.get(Object) line: 358	
	WebappClassLoader.clearReferences() line: 1616	
	WebappClassLoader.stop() line: 1499	
	WebappLoader.stop() line: 734	
	StandardContext.stop() line: 4398	
	StandardHost(ContainerBase).stop() line: 1067	
	StandardEngine(ContainerBase).stop() line: 1067	
	StandardEngine.stop() line: 448	
	StandardService.stop() line: 510	
	StandardServer.stop() line: 734	
	Catalina.stop() line: 602	
	Catalina.start() line: 577	
	NativeMethodAccessorImpl.invoke0(Method, Object, Object[]) line: not available [native method]	
	NativeMethodAccessorImpl.invoke(Object, Object[]) line: 39	
	DelegatingMethodAccessorImpl.invoke(Object, Object[]) line: 25	
	Method.invoke(Object, Object...) line: 597	
	Bootstrap.start() line: 295	
	Bootstrap.main(String[]) line: 433	


I can reproduce the error on demand, and test a fix attempt, but my schedule is a bit hectic right now, so it may not be timely.

It hadn't occurred to me to detect when tomcat causes the problem, and mask just that error.  I think that is a reasonable solution.

Comment 47 Gili 2008-08-05 13:10:11 UTC
What about Dan's comment #39?

"Tomcat, in the desire to clear out the static fields within a class by using
reflection, has unwittingly caused the class init code to be rerun due to the
way that reflection works.  When it hits this code that tries to use the log
factory, it doesn't know that it has already broken the logfactory by setting
fields within the logfactory to null."

What happens when you silently ignore Tomcat doing the wrong thing here? Doesn't this have the potential of breaking webapp code in the future?
Comment 48 Curt Arnold 2008-08-05 13:46:32 UTC
I'm not sure how to interpret message #47.  Maybe best for me to take a shot at a fix and then for you to critique it.

We know one situation that can result in the NOPLoggerRepository warning, the undesirable reloading and reinitialization of classes in WebappClassLoader.clearReferences.  As that scenario is described, it seems like there is little potential for resource leakage or other issue that should concern a site admin, so the message could be dropped to a LogLog.debug() or similar which would likely be suppressed.

However, we can't say that all situations that can result in the NOPLoggerRepository are as innocuous.  So we don't want to lower the severity in situations where we can't identify.  In those cases, we've got an obligation to let the admin know that log4j has a serious problem and can't perform as would be expected.  Maybe sometimes the NOPLoggerRepo message does occur because of misconfiguration or unterminated threads, if those cases occur, we shouldn't hide the info from the admin.  Comment #38 touched on the need to not hide problems from admins that might let them know that things are going south.

There is a compromise here and it won't be perfect, however hopefully we could eliminate a lot of unnecessarily scary messages to Tomcat admins that don't reflect a potential stability issue while still informing other uses when we feel that our stability has been compromised.
Comment 49 Gili 2008-08-05 13:49:44 UTC
Curt,

I interpret comment #47 as saying that Tomcat isn't clearing the references properly (it clears some references then reinitializes a class by mistake). Shouldn't be trying to fix *that* instead of simply suppressing the resulting warning message? Who knows what effects partial reference-clearing will have on future webapps?!
Comment 50 Thorbjørn Ravn Andersen 2008-08-05 14:02:37 UTC
(In reply to comment #49)
> Curt,
> 
> I interpret comment #47 as saying that Tomcat isn't clearing the references
> properly (it clears some references then reinitializes a class by mistake).
> Shouldn't be trying to fix *that* instead of simply suppressing the resulting
> warning message? Who knows what effects partial reference-clearing will have on
> future webapps?!

If this is a matter of just Tomcat doing the wrong thing, and that it can be configured to do the right thing, then why not just say so (and how) in the message shown to the user?
 
Or does this happen elsewhere than just in web apps inside Tomcat
Comment 51 Gili 2008-08-05 14:04:06 UTC
Dan, can you please clarify your comment #39?
Comment 52 Dan Armbrust 2008-08-05 14:19:15 UTC
(In reply to comment #50)
> (In reply to comment #49)
> > Curt,
> > 
> > I interpret comment #47 as saying that Tomcat isn't clearing the references
> > properly (it clears some references then reinitializes a class by mistake).
> > Shouldn't be trying to fix *that* instead of simply suppressing the resulting
> > warning message? Who knows what effects partial reference-clearing will have on
> > future webapps?!
> 
> If this is a matter of just Tomcat doing the wrong thing, and that it can be
> configured to do the right thing, then why not just say so (and how) in the
> message shown to the user?
> 
> Or does this happen elsewhere than just in web apps inside Tomcat
> 

Tomcat is doing the best job that it can at clearing references.  But what it doesn't (and can't) know is what a safe order would be to clear the  references.  It is just walking through the classes that a given classloader has loaded.  I'm not sure if there is any order, other than random, here.

So, what happens for me is that tomcat nulls out the references inside of the logfactory class.  Then, it goes off to some other class - in my first example here it was the org.apache.struts.validator.ValidatorForm class from struts - and it tries to null out the references inside of that class.

Things break because Tomcat is using reflection to find out what all of the private fields are inside of each class.  And, surprisingly (to me anyway) the way that Sun designed the reflection API's - this inadvertently calls the static init code of the ValidatorForm class again.  Which tries to use the (now broken) logfactory.

Thats what happens in this portion of the stacktrace:

ValidatorForm.<clinit>() line: 99       
Unsafe.ensureClassInitialized(Class) line: not available [native method]        
UnsafeFieldAccessorFactory.newFieldAccessor(Field, boolean) line: 25    
ReflectionFactory.newFieldAccessor(Field, boolean) line: 122   
Field.acquireFieldAccessor(boolean) line: 918   
Field.getFieldAccessor(Object) line: 899        
Field.get(Object) line: 358     
WebappClassLoader.clearReferences() line: 1616 


Thats just the way reflection works, I guess.  Maybe there is another way to use the reflection API that would prevent it from rerunning init code of classes, but I doubt it.  Tomcat doesn't mean to reinitialize classes, but it happens when they use the reflection API.

Make sense?

Comment 53 Curt Arnold 2008-08-05 14:39:22 UTC
For comment #50:

The only place that we know where it happens is within clearReferences, but we can't say that is the only place that it happens.

We can tell Tomcat users to turn off clear references since it is an unsafe optimization, but apparently it is a very desirable optimization (otherwise it wouldn't be the default and they wouldn't have gone to the trouble to write it).  

For some class of users, they have observed that they upgrade to log4j 1.2.15 and now they get a message that they didn't get before and think that it is a log4j issue.  They never say the NPE that motivated writing the NOPLoggerRepo and the message.  For those that did see the NPE, log4j 1.2.15 was a good upgrade, those that never say the NPE and now see the message see it as a problematic upgrade.  

The problem is a latent Tomcat issue, but they don't know that and for us to tell them to change their configuration might be enough to discourage them from keeping current, so now we have pressure to try to continue to support 1.2.14 or whatever that for some reason we don't understand doesn't exhibit the issue for them (but NPE's for others).

The belt and suspenders approach (message with a FAQ and lowering level where we think it is innocuous) seems best to me, but I'll commit them as two separate changes (FAQ first) and we can decide whether to keep one or the other or both.
Comment 54 Gili 2008-08-05 20:13:21 UTC
Dan,

Something sounds suspicious about your description. It doesn't make sense that reflection would invoke the class initializer if the class has already been initialized. The Javadoc or JLS would mention it somewhere...

Did you narrow down the problem to a specific line of code? Was it Class.getDeclaredFields()?
Comment 55 Dan Armbrust 2008-08-06 07:17:28 UTC
(In reply to comment #54)
> Dan,
> 
> Something sounds suspicious about your description. It doesn't make sense that
> reflection would invoke the class initializer if the class has already been
> initialized. The Javadoc or JLS would mention it somewhere...
> 
> Did you narrow down the problem to a specific line of code? Was it
> Class.getDeclaredFields()?
> 

Didn't make much sense to me either.  The stacktrace above shows exactly where the init call happens.  And, now I did find it in the javadocs:

http://java.sun.com/j2se/1.5.0/docs/api/java/lang/reflect/Field.html#get(java.lang.Object)


Comment 56 Gili 2008-08-06 09:57:45 UTC
Dan,

Are you referring to "If the underlying field is static, the class that declared the field is initialized if it has not already been initialized."?

That only makes sense if Tomcat un-initializes the class somehow in the first place, but it isn't clear to me that it's doing this from your description... If it *is* then wouldn't the following technique fix the problem?

1) Null-out all references first
2) De-initializes classes second

a opposed to intermixing the two phases.
Comment 57 Dan Armbrust 2008-08-06 11:51:44 UTC
Maybe, I don't know if that is possible or not.  I don't know the reflection API's well enough.

Perhaps you can pursue it with the folks on the Tomcat side:

https://issues.apache.org/bugzilla/show_bug.cgi?id=41939

From the log4j side, I still like the idea of detecting when tomcat caused the problem during a shutdown sequence, and logging the error as a much lower level message.  
Comment 58 Aaron Digulla 2008-08-06 12:24:29 UTC
Re: Field.get(Object)

Some background info on reflection and static code in a class.

If you use reflection to access the static data of a class (which Tomcat has to do to clear static references so it can GC the webapp), then calling Field.get() will initialize a class if a) it hasn't been initialized, yet, b) if it has been initialized by a different classloader instead of the current one or c) when the class itself has been GC'd (not with Sun VM with default GC options but you can activate that).

There are a lot of spurious problems lurking around this "feature"; I once used it to execute code while the Groovy compiler worked on my class file. It wasn't nice.

So what can happen is that you need a class for something (i.e. someone imports it) but you don't actually use it. In this case, the classloader will have loaded the class but it won't be initialized. That's a standard optimization to make Java boot faster.

Then, you shutdown. At that stage, Tomcat will look through all loaded classes (instead of those which are initialized) and call Field.get(). And that triggers the bug.

Tomcat can't solve this, because there is no way to ask the VM if a class is initialized or not.

The only solution for this kind of bug is not to use *any* static fields in code which will be deployed to Tomcat. Especially the Loggers of log4j must not be static ... and here we run into trouble because we have to mutually exclusive design goals.

But maybe we can hack our way out of this. Solution 1 would be to install a listener which deinitializes log4j in such a way that any getLogger() call would return null or a NOPLogger or something.

Solution 2 would be to create a private final static field which Tomcat will eventually clear. If that field is cleared (and only Tomcat can do that since it's private), you know that you're in the Tomcat shutdown cycle (instead of some misconfiguration error) and you could ignore attempts to log anything instead of throwing an error.

Solution 1 would depend a bit on Tomcat but it would be reusable for other app servers. Solution 2 would be independent of the appserver; any appserver who clears static fields would just work.
Comment 59 Gili 2008-08-06 13:00:44 UTC
Aaron,

Thanks for clarifying further how this bug may occur. Okay, so I'm fine with omitting the error message but only during the Tomcat shutdown cycle.

1) How are you going to reactivate the error message once the clearReferences method is complete?

2) Can we safely assume that no other webapp will use log4j while it is being shut down?

I suspect the answer to both the above questions is that Tomcat uses a different class-loader per Webapp so Log4j will never get shared between them. So once log4j is shut down it should, in theory, never come back to life.

BTW: Why does Tomcat need to null out the references if the class-loader is being unloaded (hence all the classes inside it are being unloaded anyway)? Is it reusing the same class-loader if a webapp is reloaded? In such a case I think this fix isn't so clear-cut. We need to ensure that log4j doesn't end up in a corrupt state once the shutdown process is complete. How can we guarantee this?
Comment 60 Aaron Digulla 2008-08-06 13:37:47 UTC
> 1)

No.

> 2)

Only when log4j is in WEB-INF/lib. If you put it into common/lib, one log4j instance is shared by all webapps and Tomcat itself.

Unloading a classloader doesn't destroy the objects it created. As long as the objects live (and if they are static, they live forever ... or until someone nulls the reference), they will keep a reference to their class. So the classes can only be GC'd after all static references have been nulled.
Comment 61 Gili 2008-08-06 14:11:40 UTC
That doesn't sound right to me. As far as I remember, static fields are bound at the ClassLoader level (they do not span them).

http://forums.sun.com/thread.jspa?forumID=7&threadID=621392 says the same but I couldn't find a reference to this in the Java Language Specification (to be fair, I didn't look too deep).
Comment 62 Curt Arnold 2008-08-06 21:57:43 UTC
From comment #58

>But maybe we can hack our way out of this. Solution 1 would be to install a
>listener which deinitializes log4j in such a way that any getLogger() call
>would return null or a NOPLogger or something.

>Solution 2 would be to create a private final static field which Tomcat will
>eventually clear. If that field is cleared (and only Tomcat can do that since
>it's private), you know that you're in the Tomcat shutdown cycle (instead of
>some misconfiguration error) and you could ignore attempts to log anything
>instead of throwing an error.

log4j is already doing both of these things.  It checks a private static field that should never be null through normal language mechanism for null and if it detects that it returns a NOPLogger instance.  The complaint is that it emits a warning when it does so.

If we knew that anytime that it encountered this scenario it was running in Tomcat and in clearReferences, we could just remove the warning or lower it to a debug severity.  However, we can't say for certain that that is the only time it happens, we just know it is one place it happens.  The last suggestion was to detect if it is in this one place that we know it happens and where we think the fallback behavior doesn't mask a bigger error and in that one scenario to reduce the severity of the message.


Comment 63 Aaron Digulla 2008-08-07 00:49:48 UTC
> As far as I remember, static fields are bound
> at the ClassLoader level

Class loading is a two step process. First, the class is loaded, then it is initialized. Static fields are filled in the second step when the static code of the class is executed.

> (they do not span them)

The static fields are bound to the class object but that doesn't prevent you to use the class object created by classloader A in the context of classloader B.

This whole stuff is a huge mess and it can drive you crazy which is why most people want to avoid it but you can't when using something like Tomcat. The most simple solution is probably to use a "one app per app server" policy but that has other drawbacks.
Comment 64 Curt Arnold 2008-08-07 20:20:09 UTC
I've committed a change in rev 683811 that adds a faq on NPE's and NOPLoggerRepository warnings during class unloading.  

Bug 40212 which originally resulted in the NOPLoggerRepository warning that this bug complained about does show several other non Tomcat related scenarios that can result in log4j having its class invariants violated.  This isn't just a Tomcat issue.  Other bugs show that class invariants can be violated outside of Tomcat unload calls.

Per comment #58.  I reviewed the current trunk implementation of WebappClassLoader.clearReferences and I think that Tomcat might be able to address the problem doing two passes on iterating through the loaded classes.  One pass that would trigger class initialization if it hadn't already occurred.  This would succeed since log4j is still in a valid state.  Then a second pass that actually sets the fields to null.

Would appreciate comments on the FAQ, the detection of Tomcat shutdown and the possibility to eliminating the problem in Tomcat.
Comment 65 Gili 2008-08-07 20:33:12 UTC
Curt,

I like your proposed solution (two-pass for Tomcat). Do you think it would be possible to run it by the Tomcat staff to see what they think?
Comment 66 Curt Arnold 2008-08-07 22:12:07 UTC
Bug 41059 is still open for Tomcat and also had the two pass suggestion.  I've just attached a patch to the bug to implement the two pass approach and will track it.  I would appreciate any users who are affected by this issue to try Tomcat with the modification and report whether the problem disappears for them.
Comment 67 Mauro Molinari 2009-01-13 01:43:43 UTC
Shouldn't bug #41059 be added to the "depends on" field of this bug?

Mauro.