Bug 36082 - [PATCH] Relative URIs are not correctly resolved
Summary: [PATCH] Relative URIs are not correctly resolved
Status: CLOSED FIXED
Alias: None
Product: Fop - Now in Jira
Classification: Unclassified
Component: general (show other bugs)
Version: trunk
Hardware: Other other
: P2 normal
Target Milestone: ---
Assignee: fop-dev
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2005-08-08 15:35 UTC by Manuel Mall
Modified: 2012-04-01 06:40 UTC (History)
0 users



Attachments
The patch file (7.02 KB, patch)
2005-08-08 15:36 UTC, Manuel Mall
Details | Diff
Simple test case file for different URI references to external-graphic (2.62 KB, text/plain)
2005-08-08 15:42 UTC, Manuel Mall
Details
Revised patch (23.64 KB, patch)
2005-08-13 15:35 UTC, Manuel Mall
Details | Diff
New class providing the FOP default implementation of the URIResolver interface (5.73 KB, text/plain)
2005-08-13 15:36 UTC, Manuel Mall
Details
Test images required by the test cases - to go into test/resources/images (219.95 KB, application/java-archive)
2005-08-13 15:38 UTC, Manuel Mall
Details
The test case files - to go into test/layoutengine/testcases (7.90 KB, application/java-archive)
2005-08-13 15:39 UTC, Manuel Mall
Details
Next revision (9.17 KB, patch)
2005-08-15 18:23 UTC, Manuel Mall
Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Manuel Mall 2005-08-08 15:35:35 UTC
Jeremias wrote on fop-dev: Basically, I've stumbled upon a few anomalies with
relative paths and resource access in general. Images were not properly loaded
and I think there were differences in behaviour between FOP and Batik.

The problem can be easily reproduced by running fop from a different directory
as the fo/xml input file and the input file has a relative URI reference to an
external image.

The problem is caused by the FOUserAgent.getBaseURL() method always returning a
URL even if it is not set and the InputHandler.render() method only setting the
base URL in the FOUserAgent if getBaseURL() returns null which it never does.

The attached patch fixes that by making getBaseURL() returning null if no
baseURL is set. Doing this on its own would have lost the functionality of
defaulting the baseURL to the current directory if not set. I therefore added a
function getBaseURLasURL to FOUserAgent which returns the current baseURL as a
java.net.URL and defaults that to a file: URL pointing to the current directory
if baseURL is not set. It also ports some URL normalisation code for relative
URLs from 0.20.5 FopImageFactory to ImageFactory.
Comment 1 Manuel Mall 2005-08-08 15:36:22 UTC
Created attachment 15955 [details]
The patch file
Comment 2 Manuel Mall 2005-08-08 15:42:37 UTC
Created attachment 15956 [details]
Simple test case file for different URI references to external-graphic

This file contains some hardcoded paths (refering to my directory layout) and
is therefore not really suitable for incluision into SVN. Not sure how I can
have "dynamic" absolute paths in URI refs.
Comment 3 Jeremias Maerki 2005-08-10 11:47:10 UTC
I've had a look at this patch and there's a problem. You removed the call to 
FOUserAgent.getStream(). FOUserAgent can be subclassed by users to supply 
special behaviour on certain URIs, i.e. that they can supply their own 
InputStream for a particular URI. I've seen your TODO line in openStream() and 
this tells me that you didn't realize what that the getStream() method was for.

The other problem is indeed the test case. I'm reluctant to add it to the 
other test cases because of the hard-coded absolute paths. With some tweaking 
of the testcase2fo.xsl stylesheet the attribute replacement mechanism could 
probably be enhanced to replace only subsets of a string. That way, we could 
provide a default variable holding the root directory of the test cases as an 
absolute path.

BTW, while I was looking at image loading earlier I had that idea of creating 
a small object which carries the original URI, the resolved URL, if any, and 
the opened stream with it so it is easier to give Batik a base URI for the SVG 
handling. As far as I can remember that's currently not possible since only 
the InputStream is passed to the analyzer. Something like that. Maybe that 
helps further with the relative URI resolution stuff (especially with SVG, of 
course).

So I'm not going to apply this patch right now. I'd rather you look at this 
again. Thanks!
Comment 4 Manuel Mall 2005-08-10 13:48:57 UTC
The FOUserAgent.openStream() method is currently empty that's why I removed the 
call and put the comment in. However, I since worked on the GIF crashing issue. 
To make it work I had to change the interfaces. Instead of an InputStream the 
ImageFactory.openStream function (renamed to openURLConnection()) returns a 
URLConnection. This is then stored in addition to the InputStream in the 
FopImage.ImageInfo structure so GifImage can use it to get the 
properURLImageSource.

However, all this is done without using FOUserAgent. To move this logic into 
FOUserAgent we would need a function in FOUserAgent like:

   URL getURL(String url) 
or may be better named
   URL resolveURL(String url) 

This would take the string argument url and combine it with the baseURL to form 
a proper absolute URL.

This would be the main function someone would overwrite in a FOUserAgent 
subclass to provide custom URL resolution and to implement custom URL schemes.

We could then add helper functions to FOUserAgent like:
    URLConnection openConnection(String url)
which would first resolve the url and then do an openConnection on it and 
similar:
    InputStream openStream(String url)
which would call openConnection(url) first and then return its InputStream.

These helpers may save a few lines of code in a few places in fop but are not 
really core.

Comments please ? Am I on the right track?
Comment 5 Jeremias Maerki 2005-08-10 14:32:18 UTC
(In reply to comment #4)
> The FOUserAgent.openStream() method is currently empty that's why I removed 
the 
> call and put the comment in. 

I see. I've just improved the documentation.
http://svn.apache.org/viewcvs?rev=231213&view=rev

> However, I since worked on the GIF crashing issue. 
> To make it work I had to change the interfaces. Instead of an InputStream 
the 
> ImageFactory.openStream function (renamed to openURLConnection()) returns a 
> URLConnection. This is then stored in addition to the InputStream in the 
> FopImage.ImageInfo structure so GifImage can use it to get the 
> properURLImageSource.
> However, all this is done without using FOUserAgent. To move this logic into 
> FOUserAgent we would need a function in FOUserAgent like:
>    URL getURL(String url) 
> or may be better named
>    URL resolveURL(String url) 
> This would take the string argument url and combine it with the baseURL to 
form 
> a proper absolute URL.

Ok, THAT is normal URI resolution which is also on the task list and should be 
done through a standard interface, i.e. these two:
- http://java.sun.com/j2se/1.4.2/docs/api/javax/xml/transform/URIResolver.html
- http://java.sun.com/j2se/1.4.2/docs/api/org/xml/sax/EntityResolver.html

Because this allows, among other things, the usage of XML Commons Resolver 
which is a frequently asked feature:
http://xml.apache.org/commons/components/resolver/index.html


> This would be the main function someone would overwrite in a FOUserAgent 
> subclass to provide custom URL resolution and to implement custom URL 
schemes.

Actually, when the standard resolver interfaces are used, it should not be 
necessary to override FOUserAgent but simply to set a URIResolver and/or a 
EntityResolver.

> We could then add helper functions to FOUserAgent like:
>     URLConnection openConnection(String url)

I'm against that. That would mean that people would have to have URLConnection 
implementations for their custom code and cannot simply provide an 
InputStream. Our code should manage to handle that, too. Implementing 
URLConnection subclasses may be too much hassle for some developers.

> which would first resolve the url and then do an openConnection on it and 
> similar:
>     InputStream openStream(String url)
> which would call openConnection(url) first and then return its InputStream.
> These helpers may save a few lines of code in a few places in fop but are 
not 
> really core.

We should also make sure we don't do too much and therefore make the whole 
FOUserAgent too complicated. URI resolution using the standard interfaces is a 
must at some point, getStream() is nice to have and should be sufficient for 
most people. If GIF can't be loaded through a pure InputStream, we have to 
find a different solution.

> Comments please ? Am I on the right track?

Don't bite too much into GIF. If we can't make GIF support work properly with 
the (old) classes from the JDK, it's not a big deal IMO to require Jimi or JAI 
to be present under JDK 1.3. Under JDK 1.4 we have ImageIO 
(javax.imageio.ImageIO) that should provide easier access to a GIF codec if my 
memory doesn't desert me. So it might finally be the right time to introduce 
an AbstractFopImage subclass using ImageIO (placed under src/java-1.4). I've 
started such a class but never committed it since the priorization of image 
providers was not sufficient. I can retest that class and commit it if 
necessary.

So to sum:
- FOUserAgent.getStream() is cool and very easy to use (now that it's properly 
documented). People will want to load images from a CMS without having to 
write an URL handler. URIs like "cms:823746843?lng=fr" for example.
- URI resolution using XML Commons Resolver should be possible, i.e. 
URIResolver and ideally EntityResolver should be supported, i.e. FOP should be 
able to extract the necessary infos out of Source and InputSource instances 
when loading images.
- If you can make the whole thing work for GIF by directly working with an 
URLConnection that's cool, but I wouldn't expose something like URLConnection 
to the outside of FOP.
- Investigate ImageIO as an alternative means to load GIF.

Sorry for the trouble and and I hope I'm making sense.
Comment 6 Manuel Mall 2005-08-10 17:08:30 UTC
> I see. I've just improved the documentation.
> http://svn.apache.org/viewcvs?rev=231213&view=rev

Am I missing something - that revision appears to have nothing to do with the
issue discussed?
Comment 7 Jeremias Maerki 2005-08-10 17:12:43 UTC
Oops. Looks like my copy/paster buffer has deceived me. That should have been:
http://svn.apache.org/viewcvs?rev=231215&view=rev

(In reply to comment #6)
> > I see. I've just improved the documentation.
> > http://svn.apache.org/viewcvs?rev=231213&view=rev
> Am I missing something - that revision appears to have nothing to do with the
> issue discussed?

Comment 8 Manuel Mall 2005-08-10 17:39:15 UTC
> - FOUserAgent.getStream() is cool and very easy to use (now that it's properly
> documented). People will want to load images from a CMS without having to
> write an URL handler. URIs like "cms:823746843?lng=fr" for example.
OK

> - URI resolution using XML Commons Resolver should be possible, i.e.
> URIResolver and ideally EntityResolver should be supported, i.e. FOP should
> be able to extract the necessary infos out of Source and InputSource
> instances when loading images.
Are you thinking of having set/getURIResolver methods on FOUserAgent?

> - If you can make the whole thing work for GIF by directly working with an
> URLConnection that's cool, but I wouldn't expose something like URLConnection
> to the outside of FOP.
OK
> - Investigate ImageIO as an alternative means to load GIF.
Hmm, want to look at some other items in the image handling stuff first.
Comment 9 Jeremias Maerki 2005-08-10 17:46:24 UTC
(In reply to comment #8)
> > - URI resolution using XML Commons Resolver should be possible, i.e.
> > URIResolver and ideally EntityResolver should be supported, i.e. FOP should
> > be able to extract the necessary infos out of Source and InputSource
> > instances when loading images.
> Are you thinking of having set/getURIResolver methods on FOUserAgent?

Exactly.

> > - Investigate ImageIO as an alternative means to load GIF.
> Hmm, want to look at some other items in the image handling stuff first.

That's ok. One thing after the other. :-) You may have seen, I've uploaded the 
initial ImageIOImage implementation.
http://svn.apache.org/viewcvs?rev=231223&view=rev
Comment 10 Manuel Mall 2005-08-13 15:35:57 UTC
Created attachment 16031 [details]
Revised patch

A revised patch for the uri resolving issue. This patch does actually do a few
things:

1. Addresses the URI resolving issue as discussed in this bug by providing a
FOP implementation of the URIResolver interface as well as the capabilities to
set resolver on the FOUserAgent object.

2. Modifies the BMPReader to extract the resolution information.

3. Fixes a possible array bounds exception in BMPImage which can happen for BMP
images with extra bytes at the end.

4. Provides some infrastructure in ImageFactory in preparation of external
configuration of multiple prioritised image providers per mime type.

5. Sets a proper base URL in SVGElement.

6. Provides test cases and test images for the different formats and
resolutions.
Comment 11 Manuel Mall 2005-08-13 15:36:58 UTC
Created attachment 16032 [details]
New class providing the FOP default implementation of the URIResolver interface
Comment 12 Manuel Mall 2005-08-13 15:38:44 UTC
Created attachment 16033 [details]
Test images required by the test cases - to go into test/resources/images
Comment 13 Manuel Mall 2005-08-13 15:39:45 UTC
Created attachment 16034 [details]
The test case files - to go into test/layoutengine/testcases
Comment 14 Jeremias Maerki 2005-08-15 12:43:52 UTC
Applied. Thanks a lot, Manuel. This is a big step in the right direction!
http://svn.apache.org/viewcvs?rev=232786&view=rev

I'll commit a couple of JUnit test cases in a minute that uses a custom 
URIResolver and demonstrates how this stuff is used. It also shows a couple of 
problems the will eventually need to be solved:
- The URI Resolution does not yet affect URIs in Batik.
- The ImageFactory should not rely on the InputStream on the StreamSource to 
be set, as components like XML Commons Resolver don't do that. They only set 
the resolved URL on a (Stream?)Source object. This means that in the 
FOUserAgent, a Source should be returned, not a StreamSource. For SVG graphics 
you might even want to supply a DOMSource, but that's only nice-to-have and 
for later. I'm resolving this issue now since it's purpose is fulfilled now.
Comment 15 Manuel Mall 2005-08-15 18:23:03 UTC
Created attachment 16053 [details]
Next revision

Made changes as per this conversation:

> > - The ImageFactory should not rely on the InputStream on the
> > StreamSource to be set, as components like XML Commons Resolver don't
> > do that. They only set the resolved URL on a (Stream?)Source object.
> > This means that in the FOUserAgent, a Source should be returned, not
> > a StreamSource. For SVG graphics you might even want to supply a
> > DOMSource, but that's only nice-to-have and for later. I'm resolving
> > this issue now since it's purpose is fulfilled now.
> I was considering that as well but I am still hesitant to change 
> internal interfaces around too much. If it is agreeable to either 
> abolish the FOUserAgent.getStream() function or replace it with a 
> getSource() function I'll be happy to do this.

I'd actually simply call it "resolveURI(String)" and return a Source,
similar to the resolve() method in URIResolver. Source and its
implementations provide all the different access variant for getting at
sources. ATM, it suffices to support arbitrary sources which simply
return a SystemID (which you can simply convert to a URL) and
StreamSource whose getInputStream() returns a non-null value which you
already implemented. DOMSource can wait till later and other stuff can
be added as needed. We simply need to support providing InputStreams
(done) and XML Commons URIResolver (open).
Comment 16 Jeremias Maerki 2005-08-16 08:55:09 UTC
(In reply to comment #15)
> Created an attachment (id=16053) [edit]
> Next revision
> Made changes as per this conversation:

Very cool! Applied. Thanks a lot. My XML Commons Resolver example works now, 
too. I'll expand a little on the test case to demostrate different resolution 
schemes.
http://svn.apache.org/viewcvs?rev=232949&view=rev
Comment 17 Glenn Adams 2012-04-01 06:40:04 UTC
batch transition pre-FOP1.0 resolved+fixed bugs to closed+fixed