Uploaded image for project: 'Tika'
  1. Tika
  2. TIKA-4264

Tika Pipes - Structured output (XHTML) support?

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Open
    • Major
    • Resolution: Unresolved
    • None
    • None
    • tika-pipes
    • None

    Description

      So I am able to use Tika Pipes to extract the text content from a document.

      But is it possible to use Tika Pipes to obtain structured documents? I believe Tika does this in XHTML.

      The plain text extracted from the document is great for indexing into search engine. 

      But if you want the structured text output like XHTML?

      Attachments

        Activity

          People

            Unassigned Unassigned
            ndipiazza Nicholas DiPiazza
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

              Created:
              Updated: