Details
-
Bug
-
Status: Resolved
-
Minor
-
Resolution: Not A Problem
-
1.0
-
None
-
Windows XP, Vista and Linux Ubuntu 11.10 using Sun Java 6 and Oracle Java 7
Description
I have tried Tika 1.0 language detection (java -jar tika.jar -l .\Japanese.txt) on several Japanese files (both PDF and text files) and it consistently returns lt (Lithuanian???) instead of ja. I also tried on a Chinese file which similarly incorrectly returned lt. Both English language and French language detection worked correctly.
Attachments
Issue Links
- depends upon
-
TIKA-856 Support CJK (Chinese, Japanese and Korean) language detection
- Open