Details
-
Bug
-
Status: Resolved
-
Minor
-
Resolution: Duplicate
-
1.9
-
None
-
None
Description
The standard tokenizer in 1.9.1 returns a decimal number such as "3.14" as a <HOST>, though a number like "3,141.59" is returned as a <NUM>. I believe, though I haven't tried it yet, that moving the rule for <HOST> after <NUM>, instead of before it, will obviate this. Or updating <HOST> to require a TLD as the last component, which would require you to split the interpretation of IP addresses from name-based addresses.
Attachments
Issue Links
- is related to
-
LUCENE-1100 StandardTokenizer incorrectly types certain values
- Closed