Progress update (see )
- The POM files are now updated to use the versions of the trunk (0.10.0-incubating-SNAPSHOT)
- The DBpedia Spotlight Spot engine now behaves as expected for a EnhancementEngine
- It supports asynchronous enhancements (as highly recommended by Engines calling remote services)
- It respects OfflineMode - does not allow connections to external services
- It does not catch any Exceptions - the EnhancementJobManager MUST deal with those as only it knows if an engine is OPTIONAL or REQUIRED.
- In addition I changed the communication with the Spotlight RESTful service so that request/response data are not loaded in memory twice (e.g. the Response as String and XML document)
I also added the Spot Engine to the Enhancer Bundlelist. So for Users that "mvn clean install" the branch and than "mvn clean install" the Full/Stanble Launcher in the trunk ("
/lanuchers/full") will see the DBpedia Spotlight Spot engine.
- Similar changes as for the Spot engine need to be done for the other Spotlight engines
- DBpedia Spotlight Modlues/Bundles
I have noticed that some Functionality (most noticeable the XMLParser class) is duplicated in some/all of the Spotlight engines. I thee the following possibilities to deal with that
1. ignore the duplicated code
2. create an extra module (bundle) that contains the shared functionality
3. move all engines into a single module
(1) and (2) would be favorable if typical users would only want to install a subset of the DBpedia Spotlight engines. (3) works best if it is OK to install all (but maybe use only a few - e.g. by configuring according enhancement engines or by deactivating the unused one).
- Effects on the Stanbol default Configuration
With the addition of the DBpedia Spotlight engines we might need to think about changing the default configuration of Apache Stanbol.
Currently the default EnhancementChain of the Stanbol Launchers includes all active EnhancementEngines. When we add the DBpedia Spotlight Engines this might no longer make sense as the results of the DBpedia Spotlight Engines will be very similar to those of the NER+EntityTagging engine with the default DBpedia dataset. More concrete an EnhancementChain containing all active Enhancement Engines will result in a lot of duplicate results that might confuse users new to Stanbol.
To avoid this I see two possibilities
1. Do not include the DBpedia Spotlight Engines in the default Launcher
2. Deactivate the DBpedia Spotlight Engines by default.
3. Switch from the "All active Engines Chain" to an explicitly configured Chain for the default configuration add an DBpedia Spotlight Chain.
I am strongly favoring (3) and only included (1) and (2) to give people that want to keep the "All active Engines Chain" the change to leave a comment. Note that even with (3) we can keep the "All active Engines chain" but it would no longer be the default chain.