Affects Version/s: None
Fix Version/s: 6.2
This is a sub ticket of JOSHUA-273.
Joshua output formatting is a mess. The StructuredTranslation piece is a good step in the right direction, but many problems remain. Here is a list of problems and corrections.
- There are currently four variables that contribute to defining separate paths for formatting the output: server mode (two different types) or regular mode, whether use_structured_translations is set, whether topN == 0 (i.e., whether we are outputting k-best or just quick viterbi best), and whether we are doing projecting case or doing denormalization of the output.
- In TCP mode, ServerThread.java.run() iterates over Translation objects returned by Translations. Translation.toString() is then called. %S and recasing are applied.
- In HTTP mode, ServerThread.java.handle() builds a JSONMessage, which in turn calls translation.getStructuredTranslations.get(0).getTranslationString(). No recasing or %S formatting are applied.
- In regular mode, we call Translation.toString(), which formats output in a complicated way in the constructor, using different methods depending on whether (a) use_structured_translations is set (b) topN == 0. This is a veritable mess of nested redundant output formatting. Some of these in turn use separate formatting applied in KBestExtractor's constructor.
- Get rid of topN==0. Viterbi extraction should be quicker than k-best and is used automatically if possible. The same output formatting should apply in either case.
- We should always use structured outputs, even collapsing StructuredTranslation into Translation
- Move all output formatting out of KBestExtractor. This should just return k-best items.