Bug 37458 - Datarace on org.apache.catalina.loader.WebappClassLoader
Summary: Datarace on org.apache.catalina.loader.WebappClassLoader
Status: RESOLVED FIXED
Alias: None
Product: Tomcat 5
Classification: Unclassified
Component: Catalina (show other bugs)
Version: 5.5.9
Hardware: PC Windows XP
: P2 normal (vote)
Target Milestone: ---
Assignee: Tomcat Developers Mailing List
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2005-11-10 21:34 UTC by Feng Chen
Modified: 2009-06-03 14:37 UTC (History)
1 user (show)



Attachments
patch for WebappClassLoader class of tc6.0.x (2.35 KB, patch)
2009-01-01 18:44 UTC, Konstantin Kolinko
Details | Diff
Test case file 1: TestWebappClassLoader class. (5.41 KB, text/plain)
2009-01-01 21:59 UTC, Konstantin Kolinko
Details
Test case file 2: sealed.jar resource file. (1.62 KB, application/java-archive)
2009-01-01 22:00 UTC, Konstantin Kolinko
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Feng Chen 2005-11-10 21:34:17 UTC
In findClassInternal, it is possible that entry.manifest is set to null by 
another thread right before calling definePackage(packageName, entry.manifest, 
entry.codeBase).
Comment 1 Yoav Shapira 2005-11-27 23:06:54 UTC
Suggested fix?
Comment 2 Feng Chen 2005-12-15 06:34:46 UTC
(In reply to comment #1)
> Suggested fix?

This is one possible scence to cause an exeption: suppose two threads t1 and t2,
both are trying to load a same new class. t1 goes first and hit the line 1630,
definePackage(packageName, entry.manifest, entry.codeBase), and stops before the
method call. Then t2 goes ahead and it can hit the same method call too since
the entry.manifest is still not null at that time. And then t2 or t1 continues
and loads the class and sets entry.manifest at the end. After this, when the
other thread tries to finish the call to definePackage, an exception will be
thrown since entry.manifest and entry.codebase have become null now.

I haven't got any good solution so far. One could use a huge sync block to
protect the test on entry.manifest and the corresponding later change, but that
would make it inefficient. Another alternative is to catch the nullpointer
exception thrown by definePackage and ignore it if possible.
Comment 3 Mark Thomas 2006-10-30 19:52:04 UTC
Fixed in SVN and will be included in 5.5.21 onwards.
Comment 4 Dave McIntyre 2008-03-26 20:37:00 UTC
Is it possible that the fix has been regressed by the modifications in revision 510801, as the block is now synchronised on "this" rather than "entry"?

We have seen what appears to be the same bug occur when using version 5.5.23. The start of the stacktrace we see is

java.lang.NullPointerException
        at java.lang.Package.isSealed(Package.java:179)
        at org.apache.catalina.loader.WebappClassLoader.findClassInternal(WebappClassLoader.java:1800)
        at org.apache.catalina.loader.WebappClassLoader.findClass(WebappClassLoader.java:869)
        at org.apache.catalina.loader.WebappClassLoader.loadClass(WebappClassLoader.java:1322)
        at java.lang.ClassLoader.loadClass(ClassLoader.java:299)
        at java.lang.ClassLoader.loadClass(ClassLoader.java:251)
        at java.lang.ClassLoader.loadClassInternal(ClassLoader.java:319)
        at java.lang.Class.forName0(Native Method)
        at java.lang.Class.forName(Class.java:242)
        at sun.rmi.server.LoaderHandler.loadClass(LoaderHandler.java:430)
        at sun.rmi.server.LoaderHandler.loadClass(LoaderHandler.java:165)
        at java.rmi.server.RMIClassLoader$2.loadClass(RMIClassLoader.java:620)
        at org.objectweb.jonas.server.RemoteClassLoaderSpi.loadClass(RemoteClassLoaderSpi.java:75)
        at java.rmi.server.RMIClassLoader.loadClass(RMIClassLoader.java:247)
        at sun.rmi.server.MarshalInputStream.resolveClass(MarshalInputStream.java:197)
        at java.io.ObjectInputStream.readNonProxyDesc(ObjectInputStream.java:1544)
        at java.io.ObjectInputStream.readClassDesc(ObjectInputStream.java:1466)
 ...
Comment 5 Mark Thomas 2008-03-27 06:40:02 UTC
Re-opening so I remember to look at this.
Comment 6 Mark Thomas 2008-03-27 16:13:56 UTC
There are a couple of odd things about your stack trace.

1. For 5.5.23, line 1800 of WebappClassLoader doesn't call Package.isSealed(). It is around 30 lines later.

2. I don't understand why the NPE is thrown from within java.lang.Package. If pkg was null, I'd expect to see the NPE in WebappClassLoader

3. I have checked the source for 1.4, 1.5 and 1.6 and don't see how Package.isSealed() could throw a NPE.

I have reviewed the patch you referred to and I do not see a code path by which this bug could re-appear.

You may have found a new bug, in which case a new issue is in order, but the oddities above make me think that the root cause is more likely some other factor in your configuration.

The best place to get help on this is the users list. Before posting please try a clean 5.5.26 build and also post the OS and JDK you are using. If the discussion there concludes there is a bug, please create a new issue to track it.
Comment 7 Dave McIntyre 2008-04-16 22:52:03 UTC
(In reply to comment #6)
> There are a couple of odd things about your stack trace.
> 
> 1. For 5.5.23, line 1800 of WebappClassLoader doesn't call Package.isSealed().
> It is around 30 lines later.
> 
I agree that this seems odd.  I installed a fresh download of 5.5.23 and it started reporting the error as 
java.lang.NullPointerException
        at java.lang.Package.isSealed(Package.java:179)
        at org.apache.catalina.loader.WebappClassLoader.findClassInternal(WebappClassLoader.java:1833)
        at org.apache.catalina.loader.WebappClassLoader.findClass(WebappClassLoader.java:873)
        at org.apache.catalina.loader.WebappClassLoader.loadClass(WebappClassLoader.java:1326)
        at java.lang.ClassLoader.loadClass(ClassLoader.java:299)
...

I also tried a fresh install of 5.5.26 and the line numbers changed to 
java.lang.NullPointerException
        at java.lang.Package.isSealed(Package.java:179)
        at org.apache.catalina.loader.WebappClassLoader.findClassInternal(WebappClassLoader.java:1839)
        at org.apache.catalina.loader.WebappClassLoader.findClass(WebappClassLoader.java:875)
        at org.apache.catalina.loader.WebappClassLoader.loadClass(WebappClassLoader.java:1330)


> 2. I don't understand why the NPE is thrown from within java.lang.Package. If
> pkg was null, I'd expect to see the NPE in WebappClassLoader
> 
It's not pkg that is null, it is entry.codeBase which is null. My fault for posting incorrect line numbers.

> 3. I have checked the source for 1.4, 1.5 and 1.6 and don't see how
> Package.isSealed() could throw a NPE.
> 
It is not the call to Package.isSealed() but the call to Package.isSealed(URL url) which causes the problem: it throws an NPE trying to return "url.equals(sealBase)".

> I have reviewed the patch you referred to and I do not see a code path by which
> this bug could re-appear.
> 
In the initial fix for this bug (revision 469360), this section of code was included in a block which was synchronised on "entry" which contained a smaller block synchronised on "this".  In revision 510801 the synchronisation was changed to synchronise the outer block on "this", with no synchronisation on "entry".  The problem seems to me to be that although "entry" is a local variable, the object it references is obtained externally so two different threads are able to access the same object, and thus one thread is able to set entry.codeBase to null (at line 1853 in 5.5.23) after the other thread has entered the block and committed itself to calling pkg.isSealed(entry.codeBase).

> The best place to get help on this is the users list. Before posting please try
> a clean 5.5.26 build and also post the OS and JDK you are using. If the
> discussion there concludes there is a bug, please create a new issue to track
> it.
> 
Sorry for replying directly here but I thought it was appropriate to reply directly to your comments.  As I say, I see this with a clean 5.5.26 build, running in CentOS release 4.6 (Final) under Java HotSpot(TM) Client VM (build 1.5.0_14-b03, mixed mode, sharing).

Comment 8 Mark Thomas 2009-01-01 16:25:06 UTC
I've looked at the WebappClassLoader source and I can't see how one thread could set entry.codeBase to null whilst another thread calls Package.isSealed since both calls are inside the sync block.

The only way I could see entry.codeBase being null was if getCanonicalFile() throws an IOE (line 2230 in the latest 5.5.x code).

I notice you are using RMI. I wonder if this is a factor since there does not appear to be any other reports of this issue. I have a vague recollection of issues with RMI and Tomcat installed in a path with spaces. Might this be affecting you?

How repeatable is this? A simple test case that demonstrates the issue would help significantly.

Finally, where does JOnAS fit in to all of this. Given the lack of duplicate reports, I wonder if this might be an integration issue of some sort?
Comment 9 Konstantin Kolinko 2009-01-01 18:44:06 UTC
Created attachment 23066 [details]
patch for WebappClassLoader class of tc6.0.x

I think the issue is caused by missing (entry.loadedClass == null) inside the synchronized block. (Remember the double-checked locking pattern?)

I also wonder if there is a cause for (entry.loadedClass == null) check later in the synchronized block (line 1845, i.e. before calling defineClass()) other than this very bug.

Thus the patch. It is against tc6.0.x

I do not remember whether pointer assignments are atomic in current Java, and thus whether it is safe to test for (entry.loadedClass) elsewhere in the code.
Comment 10 Konstantin Kolinko 2009-01-01 21:17:04 UTC
Huh, I now have a TestCase that reproduces the issue. I see the same exception as in comment #4.

No wonder, that it is rare. To encounter this the following conditions must be met:
1. A SecurityManager should be present
2. The class that you are loading has to be in a sealed package.
3. It is a race condition. Several iterations in the TestCase are required.
Comment 11 Konstantin Kolinko 2009-01-01 21:59:39 UTC
Created attachment 23067 [details]
Test case file 1: TestWebappClassLoader class.
Comment 12 Konstantin Kolinko 2009-01-01 22:00:16 UTC
Created attachment 23068 [details]
Test case file 2: sealed.jar resource file.
Comment 13 Konstantin Kolinko 2009-01-01 22:07:01 UTC
To reproduce:
1. Place TestWebappClassLoader.java and sealed.jar into package org.apache.catalina.loader of TC 6.0 test sources.

2. Use JUnit to run the test case.

3. The following exception is observed:

...
Caused by: java.lang.NullPointerException
	at java.lang.Package.isSealed(Package.java:181)
	at org.apache.catalina.loader.WebappClassLoader.findClassInternal(WebappClassLoader.java:1832)
	at org.apache.catalina.loader.WebappClassLoader.findClass(WebappClassLoader.java:890)
	at org.apache.catalina.loader.TestWebappClassLoader$FindClassThread.run(TestWebappClassLoader.java:48)
Comment 14 Mark Thomas 2009-01-02 06:32:08 UTC
Thanks for the patch and your analysis. That makes perfect sense.

I have applied your patch to trunk and proposed it for 6.0.x and 5.5.x.

Thanks again.
Comment 15 Mark Thomas 2009-01-14 16:26:29 UTC
This has been fixed in 6.0.x and will be included in 6.0.19 onwards.
Comment 16 Mark Thomas 2009-06-03 14:37:34 UTC
This has been fixed in 5.5.x and will be included in 5.5.28 onwards.