Commons Validator
  1. Commons Validator
  2. VALIDATOR-216

UrlValidator rejects top-level domains (TLDs) with more than 4 characters

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Minor Minor
    • Resolution: Fixed
    • Affects Version/s: 1.3.1 Release
    • Fix Version/s: 1.4.0 Release
    • Component/s: None
    • Labels:
      None

      Description

      org.apache.commons.validator.UrlValidator#isValidAuthority rejects top-level domains (TLDs) with more than four characters. There are now at least two TLDs with more characters, .museum and .travel.

      1. TLDConstants.java
        4 kB
        Gabriel Belingueres

        Activity

        Hide
        Ben Speakmon added a comment -

        This has been addressed in the new UrlValidator. I chose not to use the interface constant pattern as suggested, as that's frowned upon; rather, the domains are stored in private static final List fields inside UrlValidator for maximum flexibility.

        We will need to update UrlValidator as new domains comes out in any case, but since that happens only every couple of years, a configurable TLD mechanism seemed like overkill.

        Show
        Ben Speakmon added a comment - This has been addressed in the new UrlValidator. I chose not to use the interface constant pattern as suggested, as that's frowned upon; rather, the domains are stored in private static final List fields inside UrlValidator for maximum flexibility. We will need to update UrlValidator as new domains comes out in any case, but since that happens only every couple of years, a configurable TLD mechanism seemed like overkill.
        Hide
        Niall Pemberton added a comment -

        My first thought was this had already been fixed - but turned out that was for the email validator - VALIDATOR-66. I'm in the process of separating out the "framework" aspect of validator from the actual validation "routines":

        http://jakarta.apache.org/commons/validator/apidocs/org/apache/commons/validator/routines/package-summary.html#package_description

        So the plan is to move UrlValidator into the new routines package and at the same time take the opportunity to re-factor it. Seems to me that both this validator and the EmailValidator could benefit from breaking this down into smaller re-usable routines - so for example having a common domain validator that Url and Email both (re-) used would be a good idea.

        Thanks for reporting this Kenji and for the patch Gabriel - I'll take this issue into account when I get round to the refactoring.

        Show
        Niall Pemberton added a comment - My first thought was this had already been fixed - but turned out that was for the email validator - VALIDATOR-66 . I'm in the process of separating out the "framework" aspect of validator from the actual validation "routines": http://jakarta.apache.org/commons/validator/apidocs/org/apache/commons/validator/routines/package-summary.html#package_description So the plan is to move UrlValidator into the new routines package and at the same time take the opportunity to re-factor it. Seems to me that both this validator and the EmailValidator could benefit from breaking this down into smaller re-usable routines - so for example having a common domain validator that Url and Email both (re-) used would be a good idea. Thanks for reporting this Kenji and for the patch Gabriel - I'll take this issue into account when I get round to the refactoring.
        Hide
        Gabriel Belingueres added a comment -

        We could modify it this way (surely there are better ways, but this will quicly patch it for the medium term:

        1) Make UrlValidator implement the TLDConstants interface.
        2) Replace:
        if (topLevel.length() < 2 || topLevel.length() > 4) {
        for this:
        if (Arrays.binarySearch(TLDs, topLevel.toUpperCase()) < 0) {

        See attached file for the TLDConstants.java file, which can be easily regenerated each time a new release is about to be generated (though I don't really know how often TLDs varies, but I could guess that not much).

        Gabriel

        Show
        Gabriel Belingueres added a comment - We could modify it this way (surely there are better ways, but this will quicly patch it for the medium term: 1) Make UrlValidator implement the TLDConstants interface. 2) Replace: if (topLevel.length() < 2 || topLevel.length() > 4) { for this: if (Arrays.binarySearch(TLDs, topLevel.toUpperCase()) < 0) { See attached file for the TLDConstants.java file, which can be easily regenerated each time a new release is about to be generated (though I don't really know how often TLDs varies, but I could guess that not much). Gabriel

          People

          • Assignee:
            Niall Pemberton
            Reporter:
            Kenji Matsuoka
          • Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development