Commons Validator
  1. Commons Validator
  2. VALIDATOR-27

[validator] UrlValidator fails http://www.google.com

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 1.2.0 Release
    • Component/s: None
    • Labels:
      None
    • Environment:

      Operating System: All
      Platform: All

      Description

      Per discussion on commons-dev list the following isValid method should not
      return false for http://www.google.com;

      public void testSanity() {
      UrlValidator v = new UrlValidator();
      assertTrue(v.isValid("http://www.google.com"));
      }

      Bug/enhancement I think is to make port and path optional tokens of the url.
      Full thread available here:
      http://www.mail-archive.com/commons-dev%40jakarta.apache.org/msg46584.html

        Activity

        Hide
        Tim Reilly added a comment -

        Seems the issue is isValidPath("") returns false.

        -From rfc2396
        The authority component is preceded by a double slash "//" and is
        terminated by the next slash "/", question-mark "?",or by the end of
        the URI.

        To accomodate "" as valid the forthcoming patches change the regular expression
        to allow zero or more:

        private static final String PATH_PATTERN =

        • "/^(/[-a-zA-Z0-9_:@&?=+,.!/~*'%$]*)$/";
          + "/^(/[-a-zA-Z0-9_:@&?=+,.!/~*'%$])$/";

        Also changes to unit test data expects true when path = "" in both default
        fragement and no fragement options

        Please verify (read scrutinize)

        Thanks
        -TR

        PS - I didn't see the js implementation for UrlValidator. If I missed it sorry -
        if changes accepted though I assume it would need the change as well?

        Show
        Tim Reilly added a comment - Seems the issue is isValidPath("") returns false. -From rfc2396 The authority component is preceded by a double slash "//" and is terminated by the next slash "/", question-mark "?",or by the end of the URI. To accomodate "" as valid the forthcoming patches change the regular expression to allow zero or more: private static final String PATH_PATTERN = "/^(/ [-a-zA-Z0-9_:@&?=+,.!/~*'%$] *)$/"; + "/^(/ [-a-zA-Z0-9_:@&?=+,.!/~*'%$] ) $/"; Also changes to unit test data expects true when path = "" in both default fragement and no fragement options Please verify (read scrutinize) Thanks -TR PS - I didn't see the js implementation for UrlValidator. If I missed it sorry - if changes accepted though I assume it would need the change as well?
        Hide
        Tim Reilly added a comment -

        Created an attachment (id=12480)
        Adds * to path expression to allow zero

        Show
        Tim Reilly added a comment - Created an attachment (id=12480) Adds * to path expression to allow zero
        Hide
        Tim Reilly added a comment -

        Created an attachment (id=12481)
        Changes expect from false to true for "" path data

        Show
        Tim Reilly added a comment - Created an attachment (id=12481) Changes expect from false to true for "" path data
        Hide
        Martin Cooper added a comment -
            • *** Bug 30757 has been marked as a duplicate of this bug. *** has been marked as a duplicate of this bug. ***
        Show
        Martin Cooper added a comment - *** Bug 30757 has been marked as a duplicate of this bug. *** has been marked as a duplicate of this bug. ***
        Hide
        David Graham added a comment -

        I think we could also simplify that expression by using \w :

        private static final String PATH_PATTERN =

        • "/^(/[-a-zA-Z0-9_:@&?=+,.!/~*'%$])$/";
          + "/^(/[-\w:@&?=+,.!/~*'%$])$/";
        Show
        David Graham added a comment - I think we could also simplify that expression by using \w : private static final String PATH_PATTERN = "/^(/ [-a-zA-Z0-9_:@&?=+,.!/~*'%$] ) $/"; + "/^(/ [-\w:@&?=+,.!/~*'%$] ) $/";
        Hide
        David Graham added a comment -

        When I added the * to the end of the path regex the testSanity() method passes.
        However, when I added another line to testSanity() like this:

        assertTrue(v.isValid("http://www.google.com/"));

        it fails.

        Show
        David Graham added a comment - When I added the * to the end of the path regex the testSanity() method passes. However, when I added another line to testSanity() like this: assertTrue(v.isValid("http://www.google.com/")); it fails.
        Hide
        Rory Winston added a comment -

        Is it because it appends a port and querystring to the result?

        I added ("www.google.com/", true) as a new TestPair, and got the exception:

        testIsValid(org.apache.commons.validator.UrlTest)junit.framework.AssertionFailedError:
        http://www.google.com/:80?action=view expected:<false> but was:<true>

        i.e. it appends a :80?action=view , which obviously will fail due to the
        trailing '/'.

        Show
        Rory Winston added a comment - Is it because it appends a port and querystring to the result? I added ("www.google.com/", true) as a new TestPair, and got the exception: testIsValid(org.apache.commons.validator.UrlTest)junit.framework.AssertionFailedError: http://www.google.com/:80?action=view expected:<false> but was:<true> i.e. it appends a :80?action=view , which obviously will fail due to the trailing '/'.
        Hide
        David Chan added a comment -

        Created an attachment (id=14369)
        PATH_PATTERN allows empty paths. isValidPath() allows a trailing slash.
        TestPairs updated to reflect changes.

        I've been following this bug after trying to use the URL validator and noticing
        that it is unusably strict. I finally decided to submit a patch since allowing
        URLs of the form "http://www.google.com", "http://www.google.com/", and
        "http://www.google.com/apis/" is necessary. They are almost universally
        accepted as proper URLs.

        As far as <a href="http://www.ietf.org/rfc/rfc2396.txt">RFC2396</a> is
        concerned, Part 1.3 shows a trailing slash as being acceptable.

        I have obsoleted Tim Reilly's patches since this patch accomplishes the same
        and more. In addition, there have since been changes made in the code that
        conflict with his older patch.

        Show
        David Chan added a comment - Created an attachment (id=14369) PATH_PATTERN allows empty paths. isValidPath() allows a trailing slash. TestPairs updated to reflect changes. I've been following this bug after trying to use the URL validator and noticing that it is unusably strict. I finally decided to submit a patch since allowing URLs of the form "http://www.google.com", "http://www.google.com/", and "http://www.google.com/apis/" is necessary. They are almost universally accepted as proper URLs. As far as <a href="http://www.ietf.org/rfc/rfc2396.txt">RFC2396</a> is concerned, Part 1.3 shows a trailing slash as being acceptable. I have obsoleted Tim Reilly's patches since this patch accomplishes the same and more. In addition, there have since been changes made in the code that conflict with his older patch.
        Hide
        Martin Cooper added a comment -
            • *** Bug 34146 has been marked as a duplicate of this bug. *** has been marked as a duplicate of this bug. ***
        Show
        Martin Cooper added a comment - *** Bug 34146 has been marked as a duplicate of this bug. *** has been marked as a duplicate of this bug. ***
        Hide
        Don Brown added a comment -

        Fixed in changeset [232628]. Thanks for the patch!

        Show
        Don Brown added a comment - Fixed in changeset [232628] . Thanks for the patch!
        Hide
        Niall Pemberton added a comment -

        Re-openned and the set to "Resolved Fixed" again to correct "resolution" which was lost in Bugzilla --> JIRA conversion

        Show
        Niall Pemberton added a comment - Re-openned and the set to "Resolved Fixed" again to correct "resolution" which was lost in Bugzilla --> JIRA conversion

          People

          • Assignee:
            Unassigned
            Reporter:
            Tim Reilly
          • Votes:
            3 Vote for this issue
            Watchers:
            0 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development