[TIKA-2542] Support in tika-server for getting plain text and metadata at the same time - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Improvement
Status: Open
Priority: Minor
Resolution: Unresolved
Affects Version/s: 1.17
Fix Version/s: 2.0.0-BETA
Component/s: core, server
Labels:
- pull-request-available

Description

It would be good to have a way to get a files plain text extracted and also get the metadata detected. Currently you can only get the metadata if the request has Accepts of text/xml or text/html but then the text in the body is not the plain text as it contains html elements as well.

I propose that when requesting /tika/plain with Accepts header of text/xml, an xhtml document is returned with the metadata in head's meta elements and the plain text in the body.

Attachments

Issue Links

links to

GitHub Pull Request #216

Activity

People

Assignee:: Unassigned

Reporter:: Manolo Caracuel

Votes:: 0 Vote for this issue

Watchers:: 4 Start watching this issue

Dates

Created:: 05/Jan/18 22:11

Updated:: 21/Jul/21 22:13

Time Tracking

Estimated:

48h

Remaining:

48h

Logged:

Not Specified