Details
-
Improvement
-
Status: Resolved
-
Blocker
-
Resolution: Fixed
-
1.1
-
None
Description
As discussed on ANY23-247, the ContentExtractor is simply not fit for purpose. This issue was discovered and the cause has plagued our builds ever since. Any extractors which implement BaseRDFExtractor are based on the Extractor.ContentExtractor and hence work off of an 'unfixed' raw data stream as oppose to a more flexible model such as the TagSoupDOMExtractor.
This issue should refactor RDF extractors to enable more flexibility and to avoid issues we encounter with the strict SAX parsing logic.
Attachments
Issue Links
- is superceded by
-
ANY23-318 ExtractionException handling in BaseRDFExtractor.java kills entire extraction
- Resolved
- links to