Tapestry
  1. Tapestry
  2. TAPESTRY-1857

Email validator's regexp pattern causes possibly infinite loop

    Details

    • Type: Bug Bug
    • Status: Resolved
    • Priority: Blocker Blocker
    • Resolution: Fixed
    • Affects Version/s: 4.1.3
    • Fix Version/s: 4.1.5
    • Component/s: Framework
    • Labels:
      None

      Description

      The regex pattern used by the email validator causes a possibly infinite loop in java.util.regex.Pattern with some input.

      1. patch.txt
        2 kB
        Ulrich Stärk
      2. TAPESTRY-1857.txt
        1 kB
        Ulrich Stärk
      3. TAPESTRY-1857-2.txt
        0.9 kB
        Ulrich Stärk
      4. Test.java
        0.8 kB
        Ulrich Stärk

        Activity

        Hide
        Andreas Andreou added a comment -

        Thx again Ulrich!

        Show
        Andreas Andreou added a comment - Thx again Ulrich!
        Hide
        Ulrich Stärk added a comment -

        I accidentally uploaded a patch I created before running the tests where I forgot to escape the '#' character. In my local version it is escaped so the tests pass...

        Show
        Ulrich Stärk added a comment - I accidentally uploaded a patch I created before running the tests where I forgot to escape the '#' character. In my local version it is escaped so the tests pass...
        Hide
        Ulrich Stärk added a comment -

        correct patch...

        Show
        Ulrich Stärk added a comment - correct patch...
        Hide
        Ulrich Stärk added a comment -

        weird... they run fine for me (see below)...

        [Parser] Running:
        C:\Dokumente und Einstellungen\Uli\Eigene Dateien\Eclipse Workspaces\Eclipse 3.3\tapestry\temp-testng-customsuite.xml

        PASSED: test_Render_Contribution
        PASSED: test_Fail("fred")
        PASSED: test_Fail("foooooooooooooooooooooo")
        PASSED: test_Fail("foooooooooooooooooooooooooooo")
        PASSED: test_Fail("LASKFODSKFO@$#)DJMZCV)TQKALAD")
        PASSED: test_Fail("")
        PASSED: test_OK("hlship@apache.org")
        PASSED: test_OK("j@apache.org")
        PASSED: test_OK("jkuhnert@a.org")
        PASSED: test_OK("J@A.oRg")
        PASSED: test_OK("foo@example-bar.domain.com")
        PASSED: test_OK("FOO@EXample-bAr.domain.com")
        PASSED: test_OK("_foo@example.com")
        PASSED: test_OK("$user+mailbox_@example-domain.com")
        PASSED: test_Render_Contribution_Custom_Message
        PASSED: test_Fail_Custom_Message

        ===============================================
        org.apache.tapestry.form.validator.TestEmail
        Tests run: 16, Failures: 0, Skips: 0
        ===============================================

        ===============================================
        tapestry
        Total tests run: 16, Failures: 0, Skips: 0
        ===============================================

        Show
        Ulrich Stärk added a comment - weird... they run fine for me (see below)... [Parser] Running: C:\Dokumente und Einstellungen\Uli\Eigene Dateien\Eclipse Workspaces\Eclipse 3.3\tapestry\temp-testng-customsuite.xml PASSED: test_Render_Contribution PASSED: test_Fail("fred") PASSED: test_Fail("foooooooooooooooooooooo") PASSED: test_Fail("foooooooooooooooooooooooooooo") PASSED: test_Fail("LASKFODSKFO@$#)DJMZCV)TQKALAD") PASSED: test_Fail("") PASSED: test_OK("hlship@apache.org") PASSED: test_OK("j@apache.org") PASSED: test_OK("jkuhnert@a.org") PASSED: test_OK("J@A.oRg") PASSED: test_OK("foo@example-bar.domain.com") PASSED: test_OK("FOO@EXample-bAr.domain.com") PASSED: test_OK("_foo@example.com") PASSED: test_OK("$user+mailbox_@example-domain.com") PASSED: test_Render_Contribution_Custom_Message PASSED: test_Fail_Custom_Message =============================================== org.apache.tapestry.form.validator.TestEmail Tests run: 16, Failures: 0, Skips: 0 =============================================== =============================================== tapestry Total tests run: 16, Failures: 0, Skips: 0 ===============================================
        Hide
        Andreas Andreou added a comment -

        but now, all tests fail! there's something wrong with the last patch

        Show
        Andreas Andreou added a comment - but now, all tests fail! there's something wrong with the last patch
        Hide
        Ulrich Stärk added a comment -

        TAPESTRY-1857.txt is a patch org.apache.tapestry.form.validator.Email,
        TAPESTRY-1857-2.txt patches the test case

        Show
        Ulrich Stärk added a comment - TAPESTRY-1857 .txt is a patch org.apache.tapestry.form.validator.Email, TAPESTRY-1857 -2.txt patches the test case
        Hide
        Ulrich Stärk added a comment -

        Error in the USER_PATTERN: should be
        d not \\\\d
        Still some common RFC-compliant mail addresses like foo@bar.com or foo@bar.com and addresses containing other allowed chars will get rejected.

        Show
        Ulrich Stärk added a comment - Error in the USER_PATTERN: should be d not \\\\d Still some common RFC-compliant mail addresses like foo@bar.com or foo @bar.com and addresses containing other allowed chars will get rejected.
        Hide
        Andreas Andreou added a comment -

        Should be ok now - there's still room for improvements + customizations on it (like a flag for allowing
        localhost emails) but it should be much better now... Reopen if you feel there's still a problem somewhere.

        Show
        Andreas Andreou added a comment - Should be ok now - there's still room for improvements + customizations on it (like a flag for allowing localhost emails) but it should be much better now... Reopen if you feel there's still a problem somewhere.
        Hide
        Andreas Andreou added a comment -

        Actually wait! i've copied the logic from dojo's validator and it looks nice...
        even includes all valid TLD names

        Show
        Andreas Andreou added a comment - Actually wait! i've copied the logic from dojo's validator and it looks nice... even includes all valid TLD names
        Hide
        Ulrich Stärk added a comment -

        That pattern has several flaws:

        1.) domain names with several successive '.' will pass (e.g. foo@bar.....l.com)
        2.) characters commonly found in local parts like '+' are not covered by '\w'
        3.) RFC 3696 (http://tools.ietf.org/html/rfc3696) says: "The LDH rule, as updated, provides that the labels (words or strings separated by periods) that make up a domain name must consist of only the ASCII [ASCII] alphabetic and numeric characters, plus the hyphen." '\w' also contains the underscore char ('_') thus allowing for invalid domains.

        As to your wonderings
        a) they are
        b) at least in the local part we should support single chars
        c) there will always be tradeoffs you have to make. if you want to be fully rfc compliant you'd have to use someting like http://www.ex-parrot.com/~pdw/Mail-RFC822-Address.html as you mentioned earlier. the pattern i provided addresses the 3 points mentioned above. it has one mistake though, i mixed up the quantifier at some points, will fix that right away.

        Show
        Ulrich Stärk added a comment - That pattern has several flaws: 1.) domain names with several successive '.' will pass (e.g. foo@bar.....l.com) 2.) characters commonly found in local parts like '+' are not covered by '\w' 3.) RFC 3696 ( http://tools.ietf.org/html/rfc3696 ) says: "The LDH rule, as updated, provides that the labels (words or strings separated by periods) that make up a domain name must consist of only the ASCII [ASCII] alphabetic and numeric characters, plus the hyphen." '\w' also contains the underscore char ('_') thus allowing for invalid domains. As to your wonderings a) they are b) at least in the local part we should support single chars c) there will always be tradeoffs you have to make. if you want to be fully rfc compliant you'd have to use someting like http://www.ex-parrot.com/~pdw/Mail-RFC822-Address.html as you mentioned earlier. the pattern i provided addresses the 3 points mentioned above. it has one mistake though, i mixed up the quantifier at some points, will fix that right away.
        Hide
        Andreas Andreou added a comment -

        Before revision 483838, we were using
        public static final String PATTERN = "^\\w[-._\\w]\\w@\\w[-._\\w]\\w\\.
        w

        {2,6}

        $";
        which (just like the suggested pattern) works very fast
        but fails for single chars in local or domain part - our test case includes
        j@apache.org, jkuhnert@a.org

        I'm wondering if:
        a) such emails are valid (dojo's js validator says yes)
        b) it's worth supporting, considering the DOS opportunities.
        c) there's anything better

        Show
        Andreas Andreou added a comment - Before revision 483838, we were using public static final String PATTERN = "^\\w [-._\\w] \\w@\\w [-._\\w] \\w\\. w {2,6} $"; which (just like the suggested pattern) works very fast but fails for single chars in local or domain part - our test case includes j@apache.org, jkuhnert@a.org I'm wondering if: a) such emails are valid (dojo's js validator says yes) b) it's worth supporting, considering the DOS opportunities. c) there's anything better
        Hide
        Ulrich Stärk added a comment -

        I wrote a patch for the regex pattern that
        a) uses non-greedy quantifiers (backtracking should be avoided and is most likely the source of this bug)
        b) takes into account most of the restrictions placed upon email addresses as outlined in RFC 3696 (distilled from RFCs 2821 and 2822)

        Limitations:

        • doesn't allow for backslash-quoted chars in the local part
        • doesn't allow for a double-quote quoted local part
        • domain names consisting exclusively of digits are allowed (like 12345678.com which actually is not allowed but in practice this is disregarded)
        Show
        Ulrich Stärk added a comment - I wrote a patch for the regex pattern that a) uses non-greedy quantifiers (backtracking should be avoided and is most likely the source of this bug) b) takes into account most of the restrictions placed upon email addresses as outlined in RFC 3696 (distilled from RFCs 2821 and 2822) Limitations: doesn't allow for backslash-quoted chars in the local part doesn't allow for a double-quote quoted local part domain names consisting exclusively of digits are allowed (like 12345678.com which actually is not allowed but in practice this is disregarded)
        Hide
        Ulrich Stärk added a comment -

        I attached a test case demonstrating the issue

        Show
        Ulrich Stärk added a comment - I attached a test case demonstrating the issue

          People

          • Assignee:
            Andreas Andreou
            Reporter:
            Ulrich Stärk
          • Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development