Uploaded image for project: 'Tika'
  1. Tika
  2. TIKA-3684

Extract text returns the text multiple times

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Not A Problem
    • 2.1.0
    • None
    • docker
    • None

    Description

      We are using tika docker container as a linux service, when I want to extract text from a word document, e.g.:

      curl -T example.docx http://localhost:9998/tika --header "Accept: text/plain"

      we get the text 3 times.

      Notice: We also have tika server v1.14, and this version returns the text just as expected.

      Attachments

        1. example.docx
          23 kB
          Naama Hophstatder
        2. example.json
          8 kB
          Tim Allison
        3. tika-config-no-xmf.xml
          0.3 kB
          Tim Allison

        Activity

          People

            Unassigned Unassigned
            NNNSTH Naama Hophstatder
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: