Bug 8031 - [PATCH] Suggest ProjectHelperImpl use File.toURI().toURL().toExternalForm()
Summary: [PATCH] Suggest ProjectHelperImpl use File.toURI().toURL().toExternalForm()
Status: RESOLVED FIXED
Alias: None
Product: Ant
Classification: Unclassified
Component: Core (show other bugs)
Version: 1.5
Hardware: All All
: P3 enhancement (vote)
Target Milestone: 1.8.0
Assignee: Ant Notifications List
URL: http://developer.java.sun.com/develop...
Keywords:
: 32718 (view as bug list)
Depends on:
Blocks:
 
Reported: 2002-04-12 18:25 UTC by Jesse Glick
Modified: 2008-11-14 05:56 UTC (History)
3 users (show)



Attachments
Suggested patch (needs review) (22.86 KB, patch)
2003-10-23 19:44 UTC, Jesse Glick
Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Jesse Glick 2002-04-12 18:25:54 UTC
ProjectHelper has for some time used various tricks to ensure that the URL it
passes to the XML parser escapes some nasty filename chars like '#':

revision 1.29
date: 2000/09/14 14:04:38;  author: bodewig;  state: Exp;  lines: +3 -3
Provide the SAXParser with the File instead of an InputSource to give
it a chance to parse relative URIs.

PR: 54
Submitted by:	Achim Dannecker <A.Dannecker@gmx.de>

revision 1.45
date: 2001/01/16 13:36:38;  author: conor;  state: Exp;  lines: +15 -1
Handle directories with # in them by passing the parser
an inputstream, rather than a name.

Submitted by:	Yossie Teitz <yossie@reachcast.co.il>

This is currently handled (now in ProjectHelperImpl.java) by the following code:

String uri = "file:" + buildFile.getAbsolutePath().replace('\\', '/');
for (int index = uri.indexOf('#'); index != -1; index = uri.indexOf('#')) {
    uri = uri.substring(0, index) + "%23" + uri.substring(index + 1);
}

which works for this one case, but not others, and furthermore uses undocumented
aspects of the JRE's treatment of file: URLs which may break on some JRE
implementations.

JDK 1.2 had the File.toURL() method, but it did not solve the escaping problem
correctly; see Java bug #4472849 (see link for details).

Fortunately JDK 1.4 lets you do this:

file.toURI().toURL().toExternalForm()

which ought to provide a URL string with all suitable escapes, ready to be sent
to the parser, with semantics guaranteed by the Java platform.

I suggest that ProjectHelperImpl run code similar to this via reflection when
running on JDK 1.4, rather than the current ad-hoc code.
Comment 1 Steve Loughran 2002-06-26 04:55:53 UTC
Do you have  patch for this, Jesse? We could put into ant1.6
Comment 2 dIon Gillard 2002-06-26 05:14:51 UTC
File.toURI() is since 1.4.....

Does Ant still have a JDK 1.1 requirement?
Comment 3 Stefan Bodewig 2002-09-26 06:24:01 UTC
No trace of FIXED in CVS, sorry.
Comment 4 Stefan Bodewig 2002-11-18 14:38:11 UTC
The escaping code has now been factored to FileUtils, all ASCII chars that need
escaping are now getting treated correctly.

If you ever wanted to provide a patch, this would be the place to put it into.
Comment 5 Jesse Glick 2003-10-23 19:44:00 UTC
Well I have a patch now, which uses URI on JDK 1.4 and otherwise uses the old
logic. All unit tests seem to pass for me on Linux, JDK 1.4.2, offline=true;
otherwise YMMV. Not sure how safe this is for 1.6, might be more appropriate for
1.7 - up to judgement of the committers.

Of especial interest:

1. A couple of unit tests were IMHO incorrect and I changed them enough to pass
(hopefully still asserting what should be true). Please inspect carefully. Note
that Java does *not* specify what precise form file: URIs should take on various
platforms, only some aspects of their behavior, so the unit tests have to live
on the edge a bit.

2. I don't know where the convention of using URLs such as "file:./foo.xml" or
"file:foo.xml" (rather than simply "foo.xml") for entity includes came from, but
it certainly looks like it violates common sense as well as the URI
specification. (Relative URIs have no protocol and are resolved from a base URI
which does.) JDK 1.4's new File(URI) rejects such URIs (correctly IMHO, though I
could imagine them being resolved relative to the JVM's current working
directory, however useless that might be). Ant's faq.xml used to recommend this
"file:./etc" style for whatever reason. The patch corrects the FAQ. However, for
compatibility with existing (IMHO erroneous) build scripts, file: URIs which do
not have an absolute path beginning with '/' (and are thus illegal) are just
passed through to the old parsing mechanism even on JDK 1.4. IncludeTest thus
continues to pass using both syntaxes.

2a. While fromURI thus accepts relative paths for compatibility, toURI attempts
to always produce a URI encoding an absolute path for safety.

3. Fixed up some places in the codebase that were failing to use the FileUtils
methods to create or interpret file: URLs and were thus in danger of behaving
incorrectly for some pathnames.

4. Removed obsolete FILE_PROTOCOL_PREFIX from the XSLTLiaison interface. It was
not being used anywhere, and was not really correct anyway. (F_P_P +
file.getAbsolutePath() is not a safe way to make URIs, and the extra "//" at the
end of the prefix is gratuitous since there will never be an authority for a
file URI.)
Comment 6 Jesse Glick 2003-10-23 19:44:50 UTC
Created attachment 8706 [details]
Suggested patch (needs review)
Comment 8 Jesse Glick 2003-12-06 02:15:49 UTC
BTW I've noticed that in Ant 1.6b3, if I try to run a build script contained in
a directory whose name contains UTF-8-encoded non-ASCII characters (on Linux:
locale en_US.UTF-8, glibc 2.3.2, kernel 2.4.20), it fails right away with an
ArrayIndexOutOfBoundsException (e.g. index 269) from FileUtils.toURI:

    if (isSpecial[c]) { <-- here

Ant 1.5.3 runs the same script just fine, and everything else in Java and in
fact the rest of my operating system deals with, and displays, the directory
correctly...

I presume the previously attached patch would solve this bug (on JDK 1.4),
though I haven't tried it.

A quick fix for Ant 1.6 might be to change this line to read:

    if (c < 256 && isSpecial[c]) {

which I think would at least pass through Unicode characters untouched, which
Java should be able to handle. The only special characters you really care about
- e.g. '#' or '?' - are in ASCII anyway.
Comment 9 Jesse Glick 2004-02-03 18:16:50 UTC
See bug #26642 re. my last comment.
Comment 10 Matt Benson 2004-12-15 23:14:08 UTC
Note that Peter added the recommended c < 256 check on 2/3/2004.
Comment 11 Matt Benson 2004-12-15 23:15:43 UTC
...documented in bug 26642 which is mentioned above.
Comment 12 Matt Benson 2004-12-16 22:41:42 UTC
*** Bug 32718 has been marked as a duplicate of this bug. ***
Comment 13 Antoine Levy-Lambert 2005-12-16 07:43:34 UTC
Patches submitted, thanks. 
Note that toURI is implemented with java.net.URI.toASCIIString(), otherwise the
non ascii characters are not properly % escaped.
I have also had to do some small changes for the FileUtilsTest.

fromURI("file:///foo") ==> C:\foo
before it was \foo

Actually, this creates an inconsistency in the behavior of Ant between
ant running on JDK 1.2 or 1.3 and Ant running on JDK 1.4 or 1.5.

This should be fixed in the implemetation of fromURI.
Comment 14 Kev Jackson 2007-10-08 02:08:15 UTC
Is this still open?
Comment 15 Jesse Glick 2007-10-09 18:20:47 UTC
Seems to have already been applied on 16 Dec 2005, SVN rev 357131.