Description
import org.apache.commons.validator.routines.UrlValidator; ... private static final String[] schemes = {"http", "https"}; private static final UrlValidator urlValidator = new UrlValidator(schemes, UrlValidator.ALLOW_LOCAL_URLS + UrlValidator.ALLOW_2_SLASHES); ... urlValidator.isValid("https://example.com//some_path/path/")
This returns false. However such URL is valid if authority is not null.
The reason it returns false is this code in the validator:
try { URI uri = new URI(null,null,path,null); String norm = uri.normalize().getPath(); if (norm.startsWith("/../") // Trying to go via the parent dir || norm.equals("/..")) { // Trying to go to the parent dir return false; } } catch (URISyntaxException e) { return false; }
As far as I understand URI uri = new URI(null,null,path,null); throws URISyntaxException if authority is null and path starts with //.
I tried running new URI(null, "example.com", path, null); and it worked.
I didn't read RFC but from some googling around I got the following:
//some_path is invalid if authority is null
//some_path is valid if authority is not null
Update:
Another thing I noticed while testing is that the following actually passes the validation – "https://example.com//test"
And that "https://example.com//test_test" fails the validation but URISyntaxException is thrown due to Illegal character in hostname and not due to // at the start.
So my original theory behind the failure looks incorrect now, however I still consider this bug as a valid one.
I guess better description would be "URL validator incorrectly uses URI uri = new URI(null,null,path,null); check. Due to these nulls in arguments path is validated as a hostname".
And the simplest URL to test is https://example.com//test_double_slash_and_underscore
Attachments
Issue Links
- is duplicated by
-
VALIDATOR-429 UrlValidator - path is invalid due to using java.net.URI for validation (regression)
- Closed