Details
-
Bug
-
Status: Resolved
-
Minor
-
Resolution: Incomplete
-
1.4
-
None
-
None
-
Operating System: other
Platform: Other
-
35971
Description
The StandardTokenizer assumes that if a phrase contains a comma and at least one
digit, the phrase has to be a number. We are trying to index comma-separated
values of SAP R/3 trancation codes along with standard text. Many of these code
contain digits, e.g. "VA01" or "SE80". While tokenizing text containing these
codes, lucene recognizes a comma-separated list of them as a digit, e.g.
"VA01,VA02,VA03". The grammar should be modified to recognize numbers correctly
(e.g. containing only digits).