Uploaded image for project: 'Lucene - Core'
  1. Lucene - Core
  2. LUCENE-34

e-mail token in StandardTokenizer.jj does not match valid e-mail addresses

Details

    • Bug
    • Status: Closed
    • Major
    • Resolution: Fixed
    • None
    • None
    • modules/analysis
    • None
    • Operating System: Linux
      Platform: PC

    • 9015

    Description

      E-mail token in StandardTokenizer.jj does not match many valid e-mail
      addresses. See line 106:

      <EMAIL: <ALPHANUM> "@" <ALPHANUM> ("." <ALPHANUM>)+ >

      For example, neither danson@germane-software.com (because of the dash) nor
      dale.anson@germane-software.com (because of the first dot and the dash) match.
      the following is slightly better, but does not come close to meeting the
      specifications of RFC 822:

      <EMAIL: <ALPHANUM> ("."|"" <ALPHANUM>)+ "@" <ALPHANUM> ("."|"" <ALPHANUM>)+
      >

      This is being reported against the May 11 nightly build (I compiled from
      source using the supplied Ant build file on RedHat Linux 7.2, jikes, javacc
      2.0, and Sun Linux JDK 1.4), however, I originally ran across this problem in
      Lucene 1.2 rc4.

      Attachments

        Activity

          People

            java-dev@lucene.apache.org Lucene Developers
            danson@germane-software.com Dale Anson
            Votes:
            0 Vote for this issue
            Watchers:
            0 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: