Uploaded image for project: 'Lucene - Core'
  1. Lucene - Core
  2. LUCENE-34

e-mail token in StandardTokenizer.jj does not match valid e-mail addresses

    Details

    • Type: Bug
    • Status: Closed
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: None
    • Component/s: modules/analysis
    • Labels:
      None
    • Environment:

      Operating System: Linux
      Platform: PC

    • Bugzilla Id:
      9015

      Description

      E-mail token in StandardTokenizer.jj does not match many valid e-mail
      addresses. See line 106:

      <EMAIL: <ALPHANUM> "@" <ALPHANUM> ("." <ALPHANUM>)+ >

      For example, neither danson@germane-software.com (because of the dash) nor
      dale.anson@germane-software.com (because of the first dot and the dash) match.
      the following is slightly better, but does not come close to meeting the
      specifications of RFC 822:

      <EMAIL: <ALPHANUM> ("."|"" <ALPHANUM>)+ "@" <ALPHANUM> ("."|"" <ALPHANUM>)+
      >

      This is being reported against the May 11 nightly build (I compiled from
      source using the supplied Ant build file on RedHat Linux 7.2, jikes, javacc
      2.0, and Sun Linux JDK 1.4), however, I originally ran across this problem in
      Lucene 1.2 rc4.

        Activity

        Hide
        otis@apache.org Otis Gospodnetic added a comment -

        Perhaps something like this would then be in order:

        <EMAIL: <ALPHANUM> ("."|""|"_" <ALPHANUM>)+ "@" <ALPHANUM> ("."|"" <ALPHANUM>)+ >

        Show
        otis@apache.org Otis Gospodnetic added a comment - Perhaps something like this would then be in order: <EMAIL: <ALPHANUM> ("."|" "|"_" <ALPHANUM>)+ "@" <ALPHANUM> ("."|" " <ALPHANUM>)+ >
        Hide
        otis@apache.org Otis Gospodnetic added a comment -

        Changed it to:
        <EMAIL: <ALPHANUM> ("."|"" <ALPHANUM>)+ "@" <ALPHANUM> ("."|"" <ALPHANUM>)+ >

        Show
        otis@apache.org Otis Gospodnetic added a comment - Changed it to: <EMAIL: <ALPHANUM> ("."|" " <ALPHANUM>)+ "@" <ALPHANUM> ("."|" " <ALPHANUM>)+ >

          People

          • Assignee:
            java-dev@lucene.apache.org Lucene Developers
            Reporter:
            danson@germane-software.com Dale Anson
          • Votes:
            0 Vote for this issue
            Watchers:
            0 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development