Details
-
Improvement
-
Status: Open
-
Minor
-
Resolution: Unresolved
-
1.17
Description
It would be good to have a way to get a files plain text extracted and also get the metadata detected. Currently you can only get the metadata if the request has Accepts of text/xml or text/html but then the text in the body is not the plain text as it contains html elements as well.
I propose that when requesting /tika/plain with Accepts header of text/xml, an xhtml document is returned with the metadata in head's meta elements and the plain text in the body.
Attachments
Issue Links
- links to