Uploaded image for project: 'Tika'
  1. Tika
  2. TIKA-2542

Support in tika-server for getting plain text and metadata at the same time

    Details

      Description

      It would be good to have a way to get a files plain text extracted and also get the metadata detected. Currently you can only get the metadata if the request has Accepts of text/xml or text/html but then the text in the body is not the plain text as it contains html elements as well.

      I propose that when requesting /tika/plain with Accepts header of text/xml, an xhtml document is returned with the metadata in head's meta elements and the plain text in the body.

        Attachments

          Issue Links

            Activity

              People

              • Assignee:
                Unassigned
                Reporter:
                mcaracuel Manolo Caracuel
              • Votes:
                0 Vote for this issue
                Watchers:
                4 Start watching this issue

                Dates

                • Created:
                  Updated:

                  Time Tracking

                  Estimated:
                  Original Estimate - 48h
                  48h
                  Remaining:
                  Remaining Estimate - 48h
                  48h
                  Logged:
                  Time Spent - Not Specified
                  Not Specified