Bug 54315 - Redeploying webapp causes NullPointerException in managerBase
Summary: Redeploying webapp causes NullPointerException in managerBase
Status: RESOLVED INVALID
Alias: None
Product: Tomcat 7
Classification: Unclassified
Component: Catalina (show other bugs)
Version: 7.0.29
Hardware: PC Linux
: P2 normal (vote)
Target Milestone: ---
Assignee: Tomcat Developers Mailing List
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2012-12-17 08:56 UTC by Bo Frydkjær Jørgensen
Modified: 2014-03-10 09:05 UTC (History)
2 users (show)



Attachments
Stacktrace (13.46 KB, text/plain)
2012-12-17 08:56 UTC, Bo Frydkjær Jørgensen
Details
the logs when I get the null pointer (53.42 KB, text/plain)
2014-03-07 14:25 UTC, Lorenzo
Details
the logs when the server is redeployed successfully (100.09 KB, text/plain)
2014-03-07 14:25 UTC, Lorenzo
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Bo Frydkjær Jørgensen 2012-12-17 08:56:06 UTC
Created attachment 29775 [details]
Stacktrace

This only happends sometimes, and is hard to reproduce.

After undeploying and deploying a webapp, sometimes the webapp show an exception at the jps's. 

I've traced it down to ManagerBase.java - generateSessionId() method. It looks like the sessionIdGenerator is null. 

Attached stacktrace
Comment 1 Mark Thomas 2012-12-31 23:41:17 UTC
Hmm. An NPE there means that a session manager is being used that either has not been started or has been stopped. Given it is hard to reproduce we will probably have to resort to code analysis to determine what is goin on. I have a few ideas but before I start looking harder it would be useful to have a few more details.

1. Have you ever seen this issue after a normal start or is it only on a restart?

2. When the NPE appears, does it happen on every request from then on or does it happen to just a few requests during the restart?

3. If the answer to 2) is every request does a context restart fix it?
Comment 2 Bo Frydkjær Jørgensen 2013-01-02 06:39:21 UTC
Hi Mark (and others),

1. I have never seen this after startup of tomcat, only when undeploying/deploying a webapp.

2. Yes, it happends on all requests.

3. No, a context restart does not fix it, only a restart of the tomcat fixes it.
Comment 3 Mark Thomas 2013-01-03 21:46:29 UTC
I'm still not exactly sure what is going on here but it looks like something is going wrong with the undeploy. There have been some fixes related to failed undeploys since 7.0.29 so it would be helpful if you could test with 7.0.34. In the meantime, I'll keep digging.
Comment 4 Mark Thomas 2013-01-03 22:28:18 UTC
I've found and fixed one place where undeploy could go wrong (filters throwing exceptions on destroy) and I have fixed that in trunk and 7.0.x for 7.0.35.

I'm not sure that is what is going on here though although it is probably something very like that. I'd expect to see something in the logs during the undeploy to indicate what has gone wrong.
Comment 5 Mark Thomas 2013-01-03 22:45:39 UTC
We need to track what is going on. Please add the following to you logging.properties and then provide the section of the logs that shows the undeploy/deploy when the error occurs.

org.apache.catalina.util.LifecycleBase.level = FINE

Note: This will slow down Tomcat start and stop but it should have minimal impact on normal running.
Comment 6 Mark Thomas 2013-01-24 08:59:46 UTC
It has been three weeks without any further information provided. If the information has not been provided by the time of the next 7.0.x release then this issue will be closed as WONTFIX.
Comment 7 Lorenzo 2014-02-27 13:21:15 UTC
(In reply to Mark Thomas from comment #6)
> It has been three weeks without any further information provided. If the
> information has not been provided by the time of the next 7.0.x release then
> this issue will be closed as WONTFIX.

Hi,
I have the same error and I get it quite often.
I use Tomcat 7.0.50 on a 64 bit machine.

I have this log4j properties config, if you tell me how to change it to log the error I can provide the logs.

# configure the daily rolling file appender
log4j.rootLogger=FINE, RollingAppender

# Set the priority for each package  
log4j.logger.org.apache.catalina.util.LifecycleBase=FINE, RollingAppender  


log4j.appender.RollingAppender=org.apache.log4j.DailyRollingFileAppender
log4j.appender.RollingAppender.File=../logs/ApplicationWeb.log
log4j.appender.RollingAppender.DatePattern='.'yyyy-MM-dd
log4j.appender.RollingAppender.layout=org.apache.log4j.PatternLayout
log4j.appender.RollingAppender.layout.ConversionPattern=[%p] %d %c %M - %m%n


The staccktrace of the error I get is:

java.lang.NullPointerException
	org.apache.catalina.session.ManagerBase.generateSessionId(ManagerBase.java:807)
	org.apache.catalina.session.ManagerBase.createSession(ManagerBase.java:653)
	org.apache.catalina.connector.Request.doGetSession(Request.java:2956)
	org.apache.catalina.connector.Request.getSession(Request.java:2320)
	org.apache.catalina.connector.RequestFacade.getSession(RequestFacade.java:899)
	org.apache.catalina.connector.RequestFacade.getSession(RequestFacade.java:911)
	com.abovo.gc.plata.filter.AuthenticationFilter.doFilter(AuthenticationFilter.java:47)
Comment 8 Lorenzo 2014-02-27 13:24:29 UTC
(In reply to Mark Thomas from comment #6)
> It has been three weeks without any further information provided. If the
> information has not been provided by the time of the next 7.0.x release then
> this issue will be closed as WONTFIX.

Hi,
I have the same error and I get it quite often.
I use Tomcat 7.0.50 on a 64 bit machine.

I have this log4j properties config, if you tell me how to change it to log the error I can provide the logs.

# configure the daily rolling file appender
log4j.rootLogger=FINE, RollingAppender

# Set the priority for each package  
log4j.logger.org.apache.catalina.util.LifecycleBase=FINE, RollingAppender  


log4j.appender.RollingAppender=org.apache.log4j.DailyRollingFileAppender
log4j.appender.RollingAppender.File=../logs/ApplicationWeb.log
log4j.appender.RollingAppender.DatePattern='.'yyyy-MM-dd
log4j.appender.RollingAppender.layout=org.apache.log4j.PatternLayout
log4j.appender.RollingAppender.layout.ConversionPattern=[%p] %d %c %M - %m%n


The staccktrace of the error I get is:

java.lang.NullPointerException
	org.apache.catalina.session.ManagerBase.generateSessionId(ManagerBase.java:807)
	org.apache.catalina.session.ManagerBase.createSession(ManagerBase.java:653)
	org.apache.catalina.connector.Request.doGetSession(Request.java:2956)
	org.apache.catalina.connector.Request.getSession(Request.java:2320)
	org.apache.catalina.connector.RequestFacade.getSession(RequestFacade.java:899)
	org.apache.catalina.connector.RequestFacade.getSession(RequestFacade.java:911)
	com.abovo.gc.plata.filter.AuthenticationFilter.doFilter(AuthenticationFilter.java:47)
Comment 9 Mark Thomas 2014-02-27 13:30:12 UTC
It is extremely rude to add someone to a cc list for a bug without their permission. Repeat offenders will be blocked from further use of the ASF Bugzilla system.
Comment 10 Mark Thomas 2014-02-27 13:37:11 UTC
This is either an application bug or a bug in com.abovo.gc.plata.filter.AuthenticationFilter that is retaining and using a reference to a request object from a previous request rather than using the request object for the current request.

This is not a bug in Apache Tomcat.

Further assistance is available from the Apache Tomcat users mailing list.
Comment 11 Lorenzo 2014-02-27 13:47:49 UTC
Sorry,

I tought that you could be interested in the opportunity to have some debug about the presumed error.
I will keep investigate it.
Comment 12 Konstantin Kolinko 2014-03-05 23:18:59 UTC
Lorenzo's thread on users@ for Tomcat 7.0.50:
http://marc.info/?t=139351186000004&r=1&w=2


Just some guesses

1. This should not happen, because StandardWrapperValve.invoke() takes care of it.

If web application is being stopped while some requests are still being processed, Tomcat logs the following message is logged like the following:

[[[
05-Mar-2014 19:59:33.701 INFO [localhost-startStop-1] org.apache.catalina.core.StandardWrapper.unload Waiting for 1 instance(s) to be deallocated for Servlet [jsp]
05-Mar-2014 19:59:34.701 INFO [localhost-startStop-1] org.apache.catalina.core.StandardWrapper.unload Waiting for 1 instance(s) to be deallocated for Servlet [jsp]
]]]

If such errors are logged,
it is explainable that other errors may follow.

Note though that the messages are logged at INFO level. Sometimes logs are configured to skip those.

Note that
1) StandardContext.stopInternal() stops its wrappers (=children) before all other subcomponents
2) The issue is reported as occurring in a filter. Invoking the filter chain is performed in StandardWrapperValve.invoke()  in between wrapper.allocate() and wrapper.deallocate() calls.  As such, it should be protected by the same allocation counter that protects the servlet that is being called here.


2. In StandardWrapper.allocate()
I think the following code:
[[[
        if (unloading)
            throw new ServletException
              (sm.getString("standardWrapper.unloading", getName()));
]]]
would better be copied inside synchronized (this) {} block below.

The 'unloading' field may change while we are waiting to get into synchronized block.  That is if there is the following sequence of events:

TH1: in StandardWrapper.allocate() checks for unloading
TH1: in StandardWrapper.allocate() waits for synchronized block
TH2: executes StandardWrapper.unload()
TH1: in StandardWrapper.allocate() obtains the monitor and allocates a servlet
Expected: TH1 should not be able to allocate one, as unloading has already happened.


3. I think the o.a.c.core.StandardContext.paused field would better be marked volatile.

The redeployment happens in background thread, but the field is read in CoyoteAdapter.postParseRequest() in a request processing thread. It might read a stale value.

(It would not cause this NullPointerException as reported here. It can cause "503 Unavailable" response from StandardWrapperValve.invoke() or ServletException("standardWrapper.unloading") from StandardWrapper.allocate()).


4. In StandardWrapper.invoke() the check for "if (!context.getState().isAvailable())" is done before "wrapper.allocate()" call.

There is small time window between those checks.
(It would not cause this NullPointerException as reported here. It can cause ServletException("standardWrapper.unloading") from StandardWrapper.allocate()).

Anyway, in this case I do not see how it could be done better.
The LifecycleBase.state field is already marked as 'volatile'. Asking for it twice seems like a waste. The checks in allocate() should protect here/

========

Resume:
for 1.: Beware if logging is configured to skip INFO messages.

Of course, if context is stopped before request processing on it completes, some things are expected to fail. Tomcat waits here, the timeout is configurable. It is a tradeoff. An infinite wait is usually undesirable.

See "unloadDelay" setting on Context. The default value is 2 seconds.
http://tomcat.apache.org/tomcat-7.0-doc/config/context.html

for 2. and 3.: These can be improved.

for 4.: OK, nothing to do.
Comment 13 Mark Thomas 2014-03-07 14:10:17 UTC
re 2, I don't see how this could lead to the constant NPE issue the OP reported unless deployment failed.

Also re 2, unloading would benefit from being volatile. Adding a check for unloading in loadServlet() should address the issue of allocation occurring after unloading has started.

1, 3 & 4. Agreed.

I have applied the robustness improvements to 8.0.x for 8.0.4 and 7.0.x for 7.0.53.

I still haven't see any evidence that there is a Tomcat bug here such as an explanation of how any of the reported problems can occur that doesn't involve an application and/or library retaining a reference it shouldn't. Therefore, I am returning this bug to the invalid state.
Comment 14 Lorenzo 2014-03-07 14:25:13 UTC
Created attachment 31375 [details]
the logs when I get the null pointer
Comment 15 Lorenzo 2014-03-07 14:25:58 UTC
Created attachment 31376 [details]
the logs when the server is redeployed successfully
Comment 16 Lorenzo 2014-03-07 14:26:52 UTC
I added some logs with at the INFO level.
I hope you can gather some information from it.
Comment 17 Konstantin Kolinko 2014-03-07 22:34:55 UTC
(In reply to Mark Thomas from comment #13)
> re 2, I don't see how this could lead to the constant NPE issue the OP
> reported unless deployment failed.
> 

I said:
> TH1: in StandardWrapper.allocate() checks for unloading
> TH1: in StandardWrapper.allocate() waits for synchronized block
> TH2: executes StandardWrapper.unload()
> TH1: in StandardWrapper.allocate() obtains the monitor and allocates a servlet
> Expected: TH1 should not be able to allocate one, as unloading has
> already happened.

I mean that it continues this way:

TH2 continues shutting down the context, as it thinks that all wrappers are unloaded and no request processing is happening at this moment.

In StandardContext.stopInternal() TH2 has done "children[i].stop();" and the next steps that it does are:

* filterStop();
* ((Lifecycle) manager).stop();

In ManagerBase.stopInternal() it does "this.sessionIdGenerator = null;".


TH1: It goes on with request processing and encounters that the session manager has been stopped and manager.sessionIdGenerator is null.

This is just a theory, I have not tested it.
Comment 18 Mark Thomas 2014-03-10 09:05:51 UTC
(In reply to Konstantin Kolinko from comment #17)
> (In reply to Mark Thomas from comment #13)
> > re 2, I don't see how this could lead to the constant NPE issue the OP
> > reported unless deployment failed.


> I mean that it continues this way:
> 
> TH2 continues shutting down the context, as it thinks that all wrappers are
> unloaded and no request processing is happening at this moment.
> 
> In StandardContext.stopInternal() TH2 has done "children[i].stop();" and the
> next steps that it does are:
> 
> * filterStop();
> * ((Lifecycle) manager).stop();
> 
> In ManagerBase.stopInternal() it does "this.sessionIdGenerator = null;".
> 
> 
> TH1: It goes on with request processing and encounters that the session
> manager has been stopped and manager.sessionIdGenerator is null.
> 
> This is just a theory, I have not tested it.

That would explain a small number of NPEs. I don't see how it could result in all future requests resulting in NPEs as reported by the OP.