Uploaded image for project: 'Tika'
  1. Tika
  2. TIKA-179

Tika stand alone CLI --text output mostly not working, other output formats are fine

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Closed
    • Major
    • Resolution: Fixed
    • 0.2, 0.3
    • 0.3
    • cli
    • None
    • Java 1.5 (also tried Java 1.6). OS used: Mac OS X, Linux (CentOS)

    Description

      When using Tika standalone jar after mvn install in CLI mode, in most of my test documents (pdf, doc, ppt, odt, ), the plain text output option (-t or --text) does not produce any result. When using the other options (xml, html, metadata), the output is correct. Activating debug mode (-v) does not produce additional info either.

      When using the GUI, dragging and dropping does produce the expected results, also in the plain text tab/window

      I rebuilt tika many times in the past 2 months (cleared .m2 directory every time) from svn (latest revision tried: 724002), the CLI --text result is always the same: usually missing output.

      For now, I use the -x output option chained to html2txt as a workaround, but would prefer to use just tika to convert to plain text (which is used for further indexing in Solr).

      Thanks

      Attachments

        Activity

          People

            jukkaz Jukka Zitting
            pborgerm Paul Borgermans
            Votes:
            0 Vote for this issue
            Watchers:
            0 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: