Uploaded image for project: 'Tika'
  1. Tika
  2. TIKA-2542

Support in tika-server for getting plain text and metadata at the same time

    XMLWordPrintableJSON

Details

    Description

      It would be good to have a way to get a files plain text extracted and also get the metadata detected. Currently you can only get the metadata if the request has Accepts of text/xml or text/html but then the text in the body is not the plain text as it contains html elements as well.

      I propose that when requesting /tika/plain with Accepts header of text/xml, an xhtml document is returned with the metadata in head's meta elements and the plain text in the body.

      Attachments

        Issue Links

          Activity

            People

              Unassigned Unassigned
              mcaracuel Manolo Caracuel
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

                Created:
                Updated:

                Time Tracking

                  Estimated:
                  Original Estimate - 48h
                  48h
                  Remaining:
                  Remaining Estimate - 48h
                  48h
                  Logged:
                  Time Spent - Not Specified
                  Not Specified