Details
-
Improvement
-
Status: Resolved
-
Major
-
Resolution: Fixed
-
0.12.0
-
None
Description
The EntityLinkingEngine should cache results of lookups on the EntitySearchers.
Entities are often reoccurring in analyzed Documents. Because of that caching results for look upped tokens should provide considerable performance improvements as tatistics shows that ~90% of the processing time for the EntityLinking engine is contributed by the entity look-up.
So if 20% of all Entity mentions are about reoccurring Entities the processing time should be reduced by about 18%.
The cache will use the list of search string as key and a list of returned Entities as value. The cache will only collect look-up results for the currently analyzed document.
EntityLinking statistics will be updated to include the cache hit percentage.
This issue affects both the trunk (1.0.0-SNAPSHOT) as well as the stable 0.12 releasing branch.