Details
-
Task
-
Status: Resolved
-
Major
-
Resolution: Fixed
-
None
-
None
-
None
Description
I've been focusing mostly on the /rmeta endpoint. However, for many users who aren't enthusiasts of the wild and crazy things that can happen with embedded files (e.g., the rest of the world), it would be useful to have some of the advantages of the /rmeta endpoint without the complexity.
This would allow text + metadata in the response (for those who don't want to parse the xhtml). It would include "late metadata", that is metadata that is only added after the content extraction has begun, which does not appear in our usual xhtml output. This would enable storing the stacktrace (if the s/-stackTrace commandline option is selected) in a field (as is done in /rmeta) so that users would get what they could from a failed parse and be able to align parse exceptions with the detected mime type.
Unlike /rmeta, this proposal would not include stacktraces from embedded files.