Jeremias wrote on fop-dev: Basically, I've stumbled upon a few anomalies with relative paths and resource access in general. Images were not properly loaded and I think there were differences in behaviour between FOP and Batik. The problem can be easily reproduced by running fop from a different directory as the fo/xml input file and the input file has a relative URI reference to an external image. The problem is caused by the FOUserAgent.getBaseURL() method always returning a URL even if it is not set and the InputHandler.render() method only setting the base URL in the FOUserAgent if getBaseURL() returns null which it never does. The attached patch fixes that by making getBaseURL() returning null if no baseURL is set. Doing this on its own would have lost the functionality of defaulting the baseURL to the current directory if not set. I therefore added a function getBaseURLasURL to FOUserAgent which returns the current baseURL as a java.net.URL and defaults that to a file: URL pointing to the current directory if baseURL is not set. It also ports some URL normalisation code for relative URLs from 0.20.5 FopImageFactory to ImageFactory.
Created attachment 15955 [details] The patch file
Created attachment 15956 [details] Simple test case file for different URI references to external-graphic This file contains some hardcoded paths (refering to my directory layout) and is therefore not really suitable for incluision into SVN. Not sure how I can have "dynamic" absolute paths in URI refs.
I've had a look at this patch and there's a problem. You removed the call to FOUserAgent.getStream(). FOUserAgent can be subclassed by users to supply special behaviour on certain URIs, i.e. that they can supply their own InputStream for a particular URI. I've seen your TODO line in openStream() and this tells me that you didn't realize what that the getStream() method was for. The other problem is indeed the test case. I'm reluctant to add it to the other test cases because of the hard-coded absolute paths. With some tweaking of the testcase2fo.xsl stylesheet the attribute replacement mechanism could probably be enhanced to replace only subsets of a string. That way, we could provide a default variable holding the root directory of the test cases as an absolute path. BTW, while I was looking at image loading earlier I had that idea of creating a small object which carries the original URI, the resolved URL, if any, and the opened stream with it so it is easier to give Batik a base URI for the SVG handling. As far as I can remember that's currently not possible since only the InputStream is passed to the analyzer. Something like that. Maybe that helps further with the relative URI resolution stuff (especially with SVG, of course). So I'm not going to apply this patch right now. I'd rather you look at this again. Thanks!
The FOUserAgent.openStream() method is currently empty that's why I removed the call and put the comment in. However, I since worked on the GIF crashing issue. To make it work I had to change the interfaces. Instead of an InputStream the ImageFactory.openStream function (renamed to openURLConnection()) returns a URLConnection. This is then stored in addition to the InputStream in the FopImage.ImageInfo structure so GifImage can use it to get the properURLImageSource. However, all this is done without using FOUserAgent. To move this logic into FOUserAgent we would need a function in FOUserAgent like: URL getURL(String url) or may be better named URL resolveURL(String url) This would take the string argument url and combine it with the baseURL to form a proper absolute URL. This would be the main function someone would overwrite in a FOUserAgent subclass to provide custom URL resolution and to implement custom URL schemes. We could then add helper functions to FOUserAgent like: URLConnection openConnection(String url) which would first resolve the url and then do an openConnection on it and similar: InputStream openStream(String url) which would call openConnection(url) first and then return its InputStream. These helpers may save a few lines of code in a few places in fop but are not really core. Comments please ? Am I on the right track?
(In reply to comment #4) > The FOUserAgent.openStream() method is currently empty that's why I removed the > call and put the comment in. I see. I've just improved the documentation. http://svn.apache.org/viewcvs?rev=231213&view=rev > However, I since worked on the GIF crashing issue. > To make it work I had to change the interfaces. Instead of an InputStream the > ImageFactory.openStream function (renamed to openURLConnection()) returns a > URLConnection. This is then stored in addition to the InputStream in the > FopImage.ImageInfo structure so GifImage can use it to get the > properURLImageSource. > However, all this is done without using FOUserAgent. To move this logic into > FOUserAgent we would need a function in FOUserAgent like: > URL getURL(String url) > or may be better named > URL resolveURL(String url) > This would take the string argument url and combine it with the baseURL to form > a proper absolute URL. Ok, THAT is normal URI resolution which is also on the task list and should be done through a standard interface, i.e. these two: - http://java.sun.com/j2se/1.4.2/docs/api/javax/xml/transform/URIResolver.html - http://java.sun.com/j2se/1.4.2/docs/api/org/xml/sax/EntityResolver.html Because this allows, among other things, the usage of XML Commons Resolver which is a frequently asked feature: http://xml.apache.org/commons/components/resolver/index.html > This would be the main function someone would overwrite in a FOUserAgent > subclass to provide custom URL resolution and to implement custom URL schemes. Actually, when the standard resolver interfaces are used, it should not be necessary to override FOUserAgent but simply to set a URIResolver and/or a EntityResolver. > We could then add helper functions to FOUserAgent like: > URLConnection openConnection(String url) I'm against that. That would mean that people would have to have URLConnection implementations for their custom code and cannot simply provide an InputStream. Our code should manage to handle that, too. Implementing URLConnection subclasses may be too much hassle for some developers. > which would first resolve the url and then do an openConnection on it and > similar: > InputStream openStream(String url) > which would call openConnection(url) first and then return its InputStream. > These helpers may save a few lines of code in a few places in fop but are not > really core. We should also make sure we don't do too much and therefore make the whole FOUserAgent too complicated. URI resolution using the standard interfaces is a must at some point, getStream() is nice to have and should be sufficient for most people. If GIF can't be loaded through a pure InputStream, we have to find a different solution. > Comments please ? Am I on the right track? Don't bite too much into GIF. If we can't make GIF support work properly with the (old) classes from the JDK, it's not a big deal IMO to require Jimi or JAI to be present under JDK 1.3. Under JDK 1.4 we have ImageIO (javax.imageio.ImageIO) that should provide easier access to a GIF codec if my memory doesn't desert me. So it might finally be the right time to introduce an AbstractFopImage subclass using ImageIO (placed under src/java-1.4). I've started such a class but never committed it since the priorization of image providers was not sufficient. I can retest that class and commit it if necessary. So to sum: - FOUserAgent.getStream() is cool and very easy to use (now that it's properly documented). People will want to load images from a CMS without having to write an URL handler. URIs like "cms:823746843?lng=fr" for example. - URI resolution using XML Commons Resolver should be possible, i.e. URIResolver and ideally EntityResolver should be supported, i.e. FOP should be able to extract the necessary infos out of Source and InputSource instances when loading images. - If you can make the whole thing work for GIF by directly working with an URLConnection that's cool, but I wouldn't expose something like URLConnection to the outside of FOP. - Investigate ImageIO as an alternative means to load GIF. Sorry for the trouble and and I hope I'm making sense.
> I see. I've just improved the documentation. > http://svn.apache.org/viewcvs?rev=231213&view=rev Am I missing something - that revision appears to have nothing to do with the issue discussed?
Oops. Looks like my copy/paster buffer has deceived me. That should have been: http://svn.apache.org/viewcvs?rev=231215&view=rev (In reply to comment #6) > > I see. I've just improved the documentation. > > http://svn.apache.org/viewcvs?rev=231213&view=rev > Am I missing something - that revision appears to have nothing to do with the > issue discussed?
> - FOUserAgent.getStream() is cool and very easy to use (now that it's properly > documented). People will want to load images from a CMS without having to > write an URL handler. URIs like "cms:823746843?lng=fr" for example. OK > - URI resolution using XML Commons Resolver should be possible, i.e. > URIResolver and ideally EntityResolver should be supported, i.e. FOP should > be able to extract the necessary infos out of Source and InputSource > instances when loading images. Are you thinking of having set/getURIResolver methods on FOUserAgent? > - If you can make the whole thing work for GIF by directly working with an > URLConnection that's cool, but I wouldn't expose something like URLConnection > to the outside of FOP. OK > - Investigate ImageIO as an alternative means to load GIF. Hmm, want to look at some other items in the image handling stuff first.
(In reply to comment #8) > > - URI resolution using XML Commons Resolver should be possible, i.e. > > URIResolver and ideally EntityResolver should be supported, i.e. FOP should > > be able to extract the necessary infos out of Source and InputSource > > instances when loading images. > Are you thinking of having set/getURIResolver methods on FOUserAgent? Exactly. > > - Investigate ImageIO as an alternative means to load GIF. > Hmm, want to look at some other items in the image handling stuff first. That's ok. One thing after the other. :-) You may have seen, I've uploaded the initial ImageIOImage implementation. http://svn.apache.org/viewcvs?rev=231223&view=rev
Created attachment 16031 [details] Revised patch A revised patch for the uri resolving issue. This patch does actually do a few things: 1. Addresses the URI resolving issue as discussed in this bug by providing a FOP implementation of the URIResolver interface as well as the capabilities to set resolver on the FOUserAgent object. 2. Modifies the BMPReader to extract the resolution information. 3. Fixes a possible array bounds exception in BMPImage which can happen for BMP images with extra bytes at the end. 4. Provides some infrastructure in ImageFactory in preparation of external configuration of multiple prioritised image providers per mime type. 5. Sets a proper base URL in SVGElement. 6. Provides test cases and test images for the different formats and resolutions.
Created attachment 16032 [details] New class providing the FOP default implementation of the URIResolver interface
Created attachment 16033 [details] Test images required by the test cases - to go into test/resources/images
Created attachment 16034 [details] The test case files - to go into test/layoutengine/testcases
Applied. Thanks a lot, Manuel. This is a big step in the right direction! http://svn.apache.org/viewcvs?rev=232786&view=rev I'll commit a couple of JUnit test cases in a minute that uses a custom URIResolver and demonstrates how this stuff is used. It also shows a couple of problems the will eventually need to be solved: - The URI Resolution does not yet affect URIs in Batik. - The ImageFactory should not rely on the InputStream on the StreamSource to be set, as components like XML Commons Resolver don't do that. They only set the resolved URL on a (Stream?)Source object. This means that in the FOUserAgent, a Source should be returned, not a StreamSource. For SVG graphics you might even want to supply a DOMSource, but that's only nice-to-have and for later. I'm resolving this issue now since it's purpose is fulfilled now.
Created attachment 16053 [details] Next revision Made changes as per this conversation: > > - The ImageFactory should not rely on the InputStream on the > > StreamSource to be set, as components like XML Commons Resolver don't > > do that. They only set the resolved URL on a (Stream?)Source object. > > This means that in the FOUserAgent, a Source should be returned, not > > a StreamSource. For SVG graphics you might even want to supply a > > DOMSource, but that's only nice-to-have and for later. I'm resolving > > this issue now since it's purpose is fulfilled now. > I was considering that as well but I am still hesitant to change > internal interfaces around too much. If it is agreeable to either > abolish the FOUserAgent.getStream() function or replace it with a > getSource() function I'll be happy to do this. I'd actually simply call it "resolveURI(String)" and return a Source, similar to the resolve() method in URIResolver. Source and its implementations provide all the different access variant for getting at sources. ATM, it suffices to support arbitrary sources which simply return a SystemID (which you can simply convert to a URL) and StreamSource whose getInputStream() returns a non-null value which you already implemented. DOMSource can wait till later and other stuff can be added as needed. We simply need to support providing InputStreams (done) and XML Commons URIResolver (open).
(In reply to comment #15) > Created an attachment (id=16053) [edit] > Next revision > Made changes as per this conversation: Very cool! Applied. Thanks a lot. My XML Commons Resolver example works now, too. I'll expand a little on the test case to demostrate different resolution schemes. http://svn.apache.org/viewcvs?rev=232949&view=rev
batch transition pre-FOP1.0 resolved+fixed bugs to closed+fixed