Bug 47834

Summary: TldConfig throws Exception when exploring unpacked jar
Product: Tomcat 6 Reporter: Guillaume Sauthier <guillaume.sauthier>
Component: CatalinaAssignee: Tomcat Developers Mailing List <dev>
Status: NEW ---    
Severity: enhancement    
Priority: P2    
Version: 6.0.20   
Target Milestone: default   
Hardware: All   
OS: All   
Attachments: Patch for TldConfig.java (Toncat-6.0.x trunk)

Description Guillaume Sauthier 2009-09-14 04:55:39 UTC
Created attachment 24255 [details]
Patch for TldConfig.java (Toncat-6.0.x trunk)

Currently TldConfig expects all *.jar files to be real files, so in the case of
a completely unpacked webapp (webapp itself + libraries in WEB-INF/lib) it's
throwing Exception.

The attached patch correct this behavior by adding a new method that knows how
to handle the "jar as directory" case.
Comment 1 Mark Thomas 2009-09-14 05:15:14 UTC
The patch as currently written is broken since it only addresses TldConfig and ignores org.apache.jasper.compiler.TldLocationsCache

I have changed the status of this to enhancement since supporting JAR as a directory layout is outside the servlet spec and would be a Tomcat specific extension.

As it happens, I have support for this already written and currently undergoing testing at work. I'll commit those changes to Tomcat trunk (ie Tomcat 7) so you can also test the code.

Note that to get into Tomcat 6, there are a series of TLD related patches that first need to be approved to make TLD handling correct and consistent for Catalina and Jasper. Only when those changes have been approved can extensions like the one described here be applied.

I'll update this issue once I apply the patches to trunk.
Comment 2 Mark Thomas 2009-09-14 05:41:53 UTC
The patch for trunk was http://svn.apache.org/viewvc?rev=814617&view=rev

Let us know how you get on. If testing is successful, and once the preparatory TLD changes have been made, I'll propose the patch for back port to 6.0.x
Comment 3 Guillaume Sauthier 2009-09-14 09:43:50 UTC
Thanks for the review Mark.

I cannot use the TC7 trunk from JOnAS (bound to TC 6.0.20 for now), so I gave it a try using a pure tomcat.

I took a simple web app with a jar unpacked in the WEB-INF/lib directory.
It doesn't work as expected :'(

In tldScanWebInfLib(), when you get the resource paths from WEB-INF/lib, my jar file is there, but the String is of the form: "<something>.jar/". The trailing '/' is a problem because you test for ".jar" only.

Anyway, even if something was found, I think the current code is wrong in handling directories: in tldScanJar() the first thing we do is opening the URL to get the URLConnection. If the URL is referring to a directory, we will very quickly get an IOException.

I'll work on a refined patch tomorrow.
Comment 4 Mark Thomas 2009-09-14 09:53:44 UTC
(In reply to comment #3)
Thanks for the testing.

> In tldScanWebInfLib(), when you get the resource paths from WEB-INF/lib, my jar
> file is there, but the String is of the form: "<something>.jar/". The trailing
> '/' is a problem because you test for ".jar" only.
That is expected. Exploded jar files should be dealt with tldScanClassloaders(). You'll need to set the org.apache.jasper.compiler.TldLocationsCache.SCAN_CLASSPATH system property as well as ...SCAN_ALL_FILES and ...SCAN_ALL_DIRS

> Anyway, even if something was found, I think the current code is wrong in
> handling directories: in tldScanJar() the first thing we do is opening the URL
> to get the URLConnection. If the URL is referring to a directory, we will very
> quickly get an IOException.
Hmm. I didn't see that in my testing. I doubt I tested all possible code paths though.

> I'll work on a refined patch tomorrow.
Great.
Comment 5 Guillaume Sauthier 2009-09-15 00:23:17 UTC
(In reply to comment #4)
> Thanks for the testing.
> 
> > In tldScanWebInfLib(), when you get the resource paths from WEB-INF/lib, my jar
> > file is there, but the String is of the form: "<something>.jar/". The trailing
> > '/' is a problem because you test for ".jar" only.
> That is expected. Exploded jar files should be dealt with
> tldScanClassloaders(). You'll need to set the
> org.apache.jasper.compiler.TldLocationsCache.SCAN_CLASSPATH system property as
> well as ...SCAN_ALL_FILES and ...SCAN_ALL_DIRS

I'm sorry, I didn't get the reason why there is a method to deal with WEB-INF/lib/*.jar content (excluding unpacked directories) and another one that scan the webapp classloader. Isn't it doing the job twice ? I mean the URLs we get from the ClassLoader are WEB-INF/classes and all the jars from WEB-INF/lib.

I agree there are also the URLs from the parent's URLClassLoader, but in that case, it's not necessary to rescan the webapp's libraries, it should be sufficent to scan only the parent ClassLoaders.
In that way of thinking, it appears that tldScanClassloaders() is not appropriate when dealing with unpacked jar libraries, they should be processed in tldScanWebInfLib().

Maybe you wanted to separate (using the SCAN_CLASSPATH flag) standard processing from tomcat specific "extension" ?

WDYT ?
Comment 6 Guillaume Sauthier 2009-09-15 00:25:31 UTC
Oh, I forgot to say something important: unpacked jar libraries are not part of the scanned ClassLoader. It seems that when Tomcat creates the WebAppClassLoader, it only looks at *.jar as regular files.
Comment 7 Mark Thomas 2009-09-15 01:15:08 UTC
Yes, the handling of the per spec requirements and the handling of extensions should be kept separate.

I tend to prefer clarity over efficiency for code that is only going to be run at webapp start.

It isn't clear if you have found something that is broken or something that you would have implemented differently.
Comment 8 Guillaume Sauthier 2009-09-15 01:45:31 UTC
(In reply to comment #7)
> Yes, the handling of the per spec requirements and the handling of extensions
> should be kept separate.
> 
> I tend to prefer clarity over efficiency for code that is only going to be run
> at webapp start.

I'll keep that in mind for my undergoing patch :)

> 
> It isn't clear if you have found something that is broken or something that you
> would have implemented differently.

I think the unpacked jar handling is broken, and I think I will provide a patch that is implemented differently.
Comment 9 Guillaume Sauthier 2009-09-15 04:38:01 UTC
Just a note to remember how we discovered the problem. That may help to find the right solution, as my first bug explanation was not right.

It's with TC6.0.20.
In JOnAS we're deploying a development EAR, meaning an EAR unpacked (EjbJar unpacked and webapp unpacked, but lids in WEB-INF/lib still packed).
There is a ClassLoader hierarchy:
EjbJarsLoader (created by JOnAS and containing a "file://.../ejbjar.jar/" URL: unpacked form) parent of WebappClassLoader (created by Tomcat, untouched).

In TC6, the TLDs where discovered only using the ClassLoader scan. So, when it was analyzing the unpacked EjbJar URL, it was failing because the code was expecting an regular jar file.

In TC7, things are different:
1. TLDs declared in web.xml
2. TLDs found in WEB-INF/ and subdirectories (excluding WEB-INF/lib/ and WEB-INF/classes/)
3. TLDs from regular jar files discovered in WEB-INF/lib/
4. ClassLoader scanning

BTW, additionnal question, I see that the WEB-INF/classes directory will never be scanned for *.tld if the SCAN_CLASSPATH is not set. Is this a spec requirement (that TLD cannot be in WEB-INF/classes ) ? I say that because WEB-INF/classes can only be searched during the ClassLoader scanning, and then, it's failing because the code is expecting a regular file, not a directory....
Comment 10 Guillaume Sauthier 2009-09-15 08:30:09 UTC
(In reply to comment #8)
> > It isn't clear if you have found something that is broken or something that you
> > would have implemented differently.
> 
> I think the unpacked jar handling is broken, and I think I will provide a patch
> that is implemented differently.

I had some times to re-think and debug the current behavior of TldConfig when dealing with unpacked jar directory during the classloader scan.
I was thinking that url.openConnection() would throw an IOException when the URL refers to a File, but it works and returns a FileUrlConnection. So the execution can continue and it handles directory nicely.

AFAICT, the current code seems to work well with unpacked directory.

Thanks for your time Mark.
Comment 11 Mark Thomas 2009-09-15 08:33:10 UTC
Thanks for the additional testing.