Uploaded image for project: 'Tika'
  1. Tika
  2. TIKA-99

Support external parser programs

    XMLWordPrintableJSON

Details

    • New Feature
    • Status: Closed
    • Minor
    • Resolution: Fixed
    • None
    • 0.2
    • parser
    • None

    Description

      There should be a parser component (like ExternalParser) that invokes an external command line application, feeds the given document as input to the application, and returns the output from the application as the extracted text (or xhtml) content. This would allow integration with tools like catdoc or pdf2txt.

      Attachments

        Activity

          People

            jukkaz Jukka Zitting
            jukkaz Jukka Zitting
            Votes:
            0 Vote for this issue
            Watchers:
            0 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: