Details
-
Bug
-
Status: Resolved
-
Minor
-
Resolution: Invalid
-
4.5.8
-
None
-
None
Description
The following test case illustrates a problem with URIUtils that I have encountered:
public class Main { public static void main(String[] args) throws Exception { URI uri = UriComponentsBuilder.fromUriString("https://host/path") .pathSegment("üñîçøðé") .build() .toUri(); System.out.printf("rawPath = %s\n", uri.getRawPath()); System.out.printf("path = %s\n", uri.getPath()); uri = URIUtils.rewriteURI(uri, null, URIUtils.DROP_FRAGMENT_AND_NORMALIZE); System.out.printf("rawPath = %s\n", uri.getRawPath()); System.out.printf("path = %s\n", uri.getPath()); } }
The issue was encontered, since previous versions of httpclient didn't perform the path normalisation (the main caller is ProtocolExec in the HTTP client), and effectively only did URIUtils.DROP_FRAGMENT, so users who upgrade will get the new normalisation feature unexpectedly.
The bug exhibited by URIUtils.rewriteURI is actually caused by URLEncodedUtils.urlDecode (inside URIBuilder's ctor, which calls URIBuilder.parsePath), which does something truly nasty. It takes a String (a logical sequence of Unicode code points), casts it to a CharBuffer, then iterates over it, slicing the chars to bytes! Strange, but true.
Unicode characters in a java.net.URI are legal, as far as I can tell, and should be simply escaped as percent-encoded UTF-8 bytes as returned by URI.getRawPath - but! - not when returned unescaped by URI.getPath, which is what URIUtils.rewriteURI uses.