Uploaded image for project: 'Lucene - Core'
  1. Lucene - Core
  2. LUCENE-1403

StandardTokenizer - Improper Hostname Recognition

Details

    • Bug
    • Status: Closed
    • Major
    • Resolution: Won't Fix
    • 2.3.1, 2.3.2
    • None
    • modules/analysis
    • None
    • Java 5

    • New

    Description

      As of 2.3.1 the documentation for the StandardTokenizer states that it "Recognizes email addresses and internet hostnames as one token."

      However hostnames such as "my-host.com" are recognized as two tokens "my" and "host.com".

      Any host with a dash in the name is not recognized properly.

      Attachments

        Issue Links

          Activity

            People

              Unassigned Unassigned
              cwible Cullin Wible
              Votes:
              0 Vote for this issue
              Watchers:
              0 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: