Uploaded image for project: 'Commons Validator'
  1. Commons Validator
  2. VALIDATOR-429

UrlValidator - path is invalid due to using java.net.URI for validation (regression)



    • Type: Bug
    • Status: Closed
    • Priority: Major
    • Resolution: Duplicate
    • Affects Version/s: 1.6
    • Fix Version/s: 1.7
    • Component/s: Routines
    • Labels:
    • Flags:



      We've been hit by a bug in a real world application after upgrading 1.4.1 to 1.6, where previously valid URLs are no longer valid, which looks to be due to using java.net.URI for validating the path of a URL.

      Steps to Reproduce

      Our application went to validate URLs similar to the following:

      This is no longer valid in 1.6.1, but the following cases are:


      It seems paths in UrlValidator are being parsed/validated as host-names, per java.net.URI's validation.


      It looks like this may have been introduced by the following change:

      Specifically due to now using java.net.URI to validate a path. The usage is as follows in org.apache.commons.validator.routines.UrlValidator:

      URI uri = new URI(null,null,path,null);

      It looks like URI is trying to parse the path as a hostname when the schema and hostname are not specified.

      Example to reproduce:

      new URI(null, null, "//_test", null);   // throws URISyntaxException

      Same example with other parts, no longer throwing exception:

      new URI(null, "test", "//_test", null);

      Even though java.net.URI states string components can be null, it seems the URL built internally, which is validated, is slightly different. So when specifying a hostname with URI, internally it constructs:

      • //test//_test

      Using no hostname, in the same way as UrlValidator, the following is constructed and validated internally:

      • //_test

      Therefore it looks like there's either a bug in java.net.URI, or its usage is not correctly documented.


      A potential fix is to change org.apache.commons.validator.routines.UrlValidator to pass an empty string in the hostname. Internally, in java.net.URI, this produces:

      • ////_test

      Thus the hostname is empty, which is considered empty, and the correct path is validated.

      Would this fix be appropriate, or considered too fragile?

      Alternatively the fix could be to extract similar logic to java.net.URI, to validate the path, which appears to be just checking the characters are valid and between a certain range. This logic can be seen in java.net.URI.parseHierarchical, which calls upon checkChars.


          Issue Links



              • Assignee:
                limpygnome limpygnome
              • Votes:
                0 Vote for this issue
                3 Start watching this issue


                • Created: