Details
-
Improvement
-
Status: Open
-
Minor
-
Resolution: Unresolved
-
None
-
None
-
None
Description
Currently, Joshua expects data to be lowercased, normalized, and tokenized consistent with the way the training data was prepared before being passed in. This requires calling Perl scripts on the input data. It would be nice if these Perl scripts (located under $JOSHUA/scripts/preparation) were rewritten in Java (under org.apache.joshua.util) so that Joshua could do this normalization itself. This would be particularly useful for the language packs.