Description
There should be a parser component (like ExternalParser) that invokes an external command line application, feeds the given document as input to the application, and returns the output from the application as the extracted text (or xhtml) content. This would allow integration with tools like catdoc or pdf2txt.