Uploaded image for project: 'Lucene - Core'
  1. Lucene - Core
  2. LUCENE-571

StandardTokenizer parses decimal number as <HOST>

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Minor
    • Resolution: Duplicate
    • 1.9
    • None
    • modules/analysis
    • None

    Description

      The standard tokenizer in 1.9.1 returns a decimal number such as "3.14" as a <HOST>, though a number like "3,141.59" is returned as a <NUM>. I believe, though I haven't tried it yet, that moving the rule for <HOST> after <NUM>, instead of before it, will obviate this. Or updating <HOST> to require a TLD as the last component, which would require you to split the interpretation of IP addresses from name-based addresses.

      Attachments

        Issue Links

          Activity

            People

              Unassigned Unassigned
              tree Tom Emerson
              Votes:
              0 Vote for this issue
              Watchers:
              0 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: