|
Not necessarily related, but can you think of a way that we can keep WikipediaTokenizer and StandardTokenizer in sync for these kind of things. I guess I need to go look in JFlex to see if there is a way to do inheritance. Essentially, I want the WikiTokenizer to be StandardTokenizer plus handle the Wiki syntax appropriately.
Very good question ... I don't know. It would be awesome (and, amazing) if JFlex enabled some kind of inheritance.
Since WikipediaTokenizer doesn't have the backwards compat requirement of StandardTokenizer, you can presumably just fix ACRONYM in WikipediaTokenizer to not match host names (ie hardwire to "true")? Here's the thread on JFlex for completeness, not that it it effects this patch: http://sourceforge.net/mailarchive/forum.php?thread_name=272037D7-6EA1-4D19-902F-B425A5309C2A%40apache.org&forum_name=jflex-users
Hi Grant,
have you looked at JFlex %implements and %extends directives? I have used %implements successfully in building my parsers for inheritance, where the Tokens are all constants in an interface generated not by JFlex but by a parser generator. For example %% I am quite sure %extends could also be used to build a tokenizer family. Michael,
Great work. I am glad we are moving to have the bug fixed by default, rather than the other way around. Please indulge me a couple of small nitpicks before I get to my main point in another comment
Given that the code is "temporary" until v3.0, feel free to ignore me I love the solution you have come up with, but would suggest that it is moved to StandardTokenizer instead of StandardAnalyzer.
StandardTokenizer is the class with the actual problem. Fixing it there would mean that everyone that uses StandardTokenizer gets a default fix, not just StandardAnalyzer. For instance, see I would provide suggested patches, but I am just about to go on holidays for 3 weeks. Is there a planned release date for v2.3.3 or v2.4? Added a patch to
Thanks for catching these Mark – I'll commit a fix shortly. |
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
LUCENE-1068) by default, but offers system property & static method to keep backwards compatible yet buggy behavior.I'll commit in a day or two.