Details
-
Sub-task
-
Status: Resolved
-
Major
-
Resolution: Fixed
-
None
-
None
-
None
Description
After looking at the documentation and supported languages of both I think that we should switch from the LangId Engine (based on Apache Tika Language detection) to the Langdetect Engine (based on http://code.google.com/p/language-detection/).
Normal users should not notice any difference as both engines create the same Annotations. However the later supports considerable more languages.
This change will come along with a lot of changes in the integration tests as those check on a lot of places for the LangId Engine. Those need to be changed to the Langdetect Engine.