Details
-
Improvement
-
Status: Closed
-
Major
-
Resolution: Fixed
-
1.6.0
-
Any
-
Patch, Important
Description
opennlp.tools.ml.model.IndexHashTable is custom-made Hashtable that is used to store mapping index. This Hashtable is heavily used in openlp.tools.ml.* (i.e. every model) and leads to disastrous performance.
This hashtable is probably legacy some legacy and is highly inefficient. A simple drop-in replacement by a java.util.HashMap wrapper solves the issue, doesn't break compatibility and does not add any dependency.
Training a pos-tagger on a large dataset with custom tags, I see a factor 5 improvement. It also seems to improve all ML models training pipeline.
For a quick fix.