Uploaded image for project: 'Commons Validator'
  1. Commons Validator
  2. VALIDATOR-467

URL validator fails if path starts with double slash and has underscores

    XMLWordPrintableJSON

    Details

    • Type: Bug
    • Status: Closed
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: 1.6
    • Fix Version/s: 1.7
    • Component/s: Routines
    • Labels:
      None

      Description

      import org.apache.commons.validator.routines.UrlValidator;
      ...
      private static final String[] schemes = {"http", "https"};
      private static final UrlValidator urlValidator = new UrlValidator(schemes,
              UrlValidator.ALLOW_LOCAL_URLS + UrlValidator.ALLOW_2_SLASHES);
      ...
      urlValidator.isValid("https://example.com//some_path/path/")
      

      This returns false. However such URL is valid if authority is not null.

      The reason it returns false is this code in the validator:

      https://github.com/apache/commons-validator/blob/a3771313c9f1833abf32c7c294ad1de4810e532d/src/main/java/org/apache/commons/validator/routines/UrlValidator.java#L452-L461

              try {
                  URI uri = new URI(null,null,path,null);
                  String norm = uri.normalize().getPath();
                  if (norm.startsWith("/../") // Trying to go via the parent dir 
                   || norm.equals("/..")) {   // Trying to go to the parent dir
                      return false;
                  }
              } catch (URISyntaxException e) {
                  return false;
              }
      

      As far as I understand URI uri = new URI(null,null,path,null); throws URISyntaxException if authority is null and path starts with //.

      I tried running new URI(null, "example.com", path, null); and it worked.

      I didn't read RFC but from some googling around I got the following:

      //some_path is invalid if authority is null
      //some_path is valid if authority is not null

      Update:

      Another thing I noticed while testing is that the following actually passes the validation – "https://example.com//test"
      And that "https://example.com//test_test" fails the validation but URISyntaxException is thrown due to Illegal character in hostname and not due to // at the start.

      So my original theory behind the failure looks incorrect now, however I still consider this bug as a valid one.

      I guess better description would be "URL validator incorrectly uses URI uri = new URI(null,null,path,null); check. Due to these nulls in arguments path is validated as a hostname".
      And the simplest URL to test is https://example.com//test_double_slash_and_underscore

        Attachments

          Issue Links

            Activity

              People

              • Assignee:
                Unassigned
                Reporter:
                ilarionov Ivan Larionov
              • Votes:
                1 Vote for this issue
                Watchers:
                1 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: