Details
-
Improvement
-
Status: Resolved
-
Major
-
Resolution: Won't Fix
-
0.11
-
None
-
None
Description
I'll do this in time for 12. TOKENIZE is literally useless as is. See:
http://thedatachef.blogspot.com/2011/04/lucene-text-tokenization-udf-for-apache.html
https://github.com/Ganglion/varaha/blob/master/src/main/java/varaha/text/TokenizeText.java
Attachments
Issue Links
- duplicates
-
DATAFU-14 Add NGram Tokenizer to datafu.pig.text.lucene
- Closed