Uploaded image for project: 'Tika'
  1. Tika
  2. TIKA-2542

Support in tika-server for getting plain text and metadata at the same time

Attach filesAttach ScreenshotAdd voteVotersWatch issueWatchersCreate sub-taskLinkCloneUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

Details

    Description

      It would be good to have a way to get a files plain text extracted and also get the metadata detected. Currently you can only get the metadata if the request has Accepts of text/xml or text/html but then the text in the body is not the plain text as it contains html elements as well.

      I propose that when requesting /tika/plain with Accepts header of text/xml, an xhtml document is returned with the metadata in head's meta elements and the plain text in the body.

      Attachments

        Activity

          This comment will be Viewable by All Users Viewable by All Users
          Cancel

          People

            Unassigned Unassigned
            mcaracuel Manolo Caracuel

            Dates

              Created:
              Updated:

              Time Tracking

                Estimated:
                Original Estimate - 48h
                48h
                Remaining:
                Remaining Estimate - 48h
                48h
                Logged:
                Time Spent - Not Specified
                Not Specified

                Issue deployment