Uploaded image for project: 'Tika'
  1. Tika
  2. TIKA-179

Tika stand alone CLI --text output mostly not working, other output formats are fine

    XMLWordPrintableJSON

    Details

    • Type: Bug
    • Status: Closed
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: 0.2, 0.3
    • Fix Version/s: 0.3
    • Component/s: cli
    • Labels:
      None
    • Environment:

      Java 1.5 (also tried Java 1.6). OS used: Mac OS X, Linux (CentOS)

      Description

      When using Tika standalone jar after mvn install in CLI mode, in most of my test documents (pdf, doc, ppt, odt, ), the plain text output option (-t or --text) does not produce any result. When using the other options (xml, html, metadata), the output is correct. Activating debug mode (-v) does not produce additional info either.

      When using the GUI, dragging and dropping does produce the expected results, also in the plain text tab/window

      I rebuilt tika many times in the past 2 months (cleared .m2 directory every time) from svn (latest revision tried: 724002), the CLI --text result is always the same: usually missing output.

      For now, I use the -x output option chained to html2txt as a workaround, but would prefer to use just tika to convert to plain text (which is used for further indexing in Solr).

      Thanks

        Attachments

          Activity

            People

            • Assignee:
              jukkaz Jukka Zitting
              Reporter:
              pborgerm Paul Borgermans
            • Votes:
              0 Vote for this issue
              Watchers:
              0 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: