Details
Description
New available UIMA Ruta Runtime 2.7.0 & Workbench 2.3.0 for Eclipse has lost proper functionality of MARKTABLE action. This action stopped annotating of all words from a csv file. I had noticed that the problem happened only for words written in Cyrillic witch contains spaces, i.e. for Latin it works fine. Please use sample outlined below in order to reproduce the problem i'm talking about.
- script/main.ruta
WORDTABLE Dict = 'dict.csv';
DECLARE Annotation Test (STRING meaning);
Document {-> MARKTABLE(Test,1,Dict, "meaning" = 2)};
- resources/dict.csv
від;from
с какой стати;why
с которой;fromWhich
сюда;here
по какому;which
сюди;here
как нибудь;somehow
сколько;howMuch
- input/test.txt
від с какой стати с которой сюда по какому сюди как нибудь сколько
After main.ruta script execution we wont get annotated everything from test.txt Worth mentioning that Cyrillic letter like 'с' at the beginning of string, somehow affecting on processing behavior. Moreover, by removing lines with spaces, will get rid us from the issue described above.