Uploaded image for project: 'Commons Validator'
  1. Commons Validator
  2. VALIDATOR-467

URL validator fails if path starts with double slash and has underscores

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Closed
    • Major
    • Resolution: Fixed
    • 1.6
    • 1.7
    • Routines
    • None

    Description

      import org.apache.commons.validator.routines.UrlValidator;
      ...
      private static final String[] schemes = {"http", "https"};
      private static final UrlValidator urlValidator = new UrlValidator(schemes,
              UrlValidator.ALLOW_LOCAL_URLS + UrlValidator.ALLOW_2_SLASHES);
      ...
      urlValidator.isValid("https://example.com//some_path/path/")
      

      This returns false. However such URL is valid if authority is not null.

      The reason it returns false is this code in the validator:

      https://github.com/apache/commons-validator/blob/a3771313c9f1833abf32c7c294ad1de4810e532d/src/main/java/org/apache/commons/validator/routines/UrlValidator.java#L452-L461

              try {
                  URI uri = new URI(null,null,path,null);
                  String norm = uri.normalize().getPath();
                  if (norm.startsWith("/../") // Trying to go via the parent dir 
                   || norm.equals("/..")) {   // Trying to go to the parent dir
                      return false;
                  }
              } catch (URISyntaxException e) {
                  return false;
              }
      

      As far as I understand URI uri = new URI(null,null,path,null); throws URISyntaxException if authority is null and path starts with //.

      I tried running new URI(null, "example.com", path, null); and it worked.

      I didn't read RFC but from some googling around I got the following:

      //some_path is invalid if authority is null
      //some_path is valid if authority is not null

      Update:

      Another thing I noticed while testing is that the following actually passes the validation – "https://example.com//test"
      And that "https://example.com//test_test" fails the validation but URISyntaxException is thrown due to Illegal character in hostname and not due to // at the start.

      So my original theory behind the failure looks incorrect now, however I still consider this bug as a valid one.

      I guess better description would be "URL validator incorrectly uses URI uri = new URI(null,null,path,null); check. Due to these nulls in arguments path is validated as a hostname".
      And the simplest URL to test is https://example.com//test_double_slash_and_underscore

      Attachments

        Issue Links

          Activity

            People

              Unassigned Unassigned
              ilarionov Ivan Larionov
              Votes:
              1 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: