Uploaded image for project: 'Tika'
  1. Tika
  2. TIKA-2045

TIKA crashes / runs out of memory on simple PDF

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • 1.13
    • 1.14, 2.0.0
    • core
    • None
    • Linux, Java 8

    Description

      We're using TIKA embedded in a webcrawler and today I've encountered a PDF that results in OutOfMemory errors while being processed by TIKA.

      It's a small, 1 page PDF file, so I don't think that it should consume that much memory.

      I verified the problem by using the GUI from the tika-app-1.13.jar file and that results in the same error on the same file. The file can be found at:

      http://www.spesmea.nl/pdf/algemene_voorwaarden_bbztcn_2010_nl.pdf

      If I can help by providing any additional information, please let me know.

      Attachments

        Issue Links

          Activity

            People

              Unassigned Unassigned
              MadEgg Egbert
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: