Uploaded image for project: 'Tika'
  1. Tika
  2. TIKA-2180

Multiple requests on Tika to extract text slows down

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Open
    • Major
    • Resolution: Unresolved
    • 1.13, 1.14
    • None
    • server
    • None
    • Windows OS, Open JDK, 4 core 32 GB RAM

    Description

      I observed that if I send multiple requests to Tika (eg. http://localhost:8080/tika) with around 5MB files, Tika is very slow in completing the action. I tried with ~20 random files, it took 170 seconds to process all the files in sequence. If I pass all files in parallel, it took around 780 seconds to process same set of files.

      Attachments

        1. screenshot-1.png
          43 kB
          Ashish Basran
        2. screenshot-2.png
          37 kB
          Ashish Basran
        3. screenshot-3.png
          41 kB
          Ashish Basran
        4. with new experimental SAX docx parser.png
          35 kB
          Ashish Basran

        Issue Links

          Activity

            People

              Unassigned Unassigned
              basran Ashish Basran
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated: