Uploaded image for project: 'Tika'
  1. Tika
  2. TIKA-2058

Memory Leak in Tika version 1.13 when parsing millions of files

Agile BoardAttach filesAttach ScreenshotVotersWatch issueWatchersCreate sub-taskLinkCloneUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • 1.13
    • 1.14, 2.0.0
    • None
    • None

    Description

      We have an application using Tika which parses roughly 7,000,000 files of different types, many of the files are MSG files with attachments. This works correctly with Tika 1.9, and has been in production for over a year, with parsing runs taking place every few weeks. The same application runs into insufficient memory problems (java heap) when using Tika 1.13.

      I have used lsof and file leak detector to track down open files, however neither shows any open files when the application is running. I did find an issue with open files https://issues.apache.org/jira/browse/TIKA-2015, however there was a workaround for this and this is not the issue.

      I am sorry to have to report this with a level of vagueness, but with lsof turning nothing up I am a bit stuck as to how to investigate further. We are more than willing to help by testing on the basis of any ideas provided.

      Attachments

        1. Yourkit screenshot.png
          313 kB
          Luís Filipe Nassif
        2. prevents-OOM-when-writable-is-false.patch
          0.8 kB
          Luís Filipe Nassif
        3. poi-3.15-beta1-p1.pom
          3 kB
          Tim Allison
        4. poi-3.15-beta1-p1.jar
          2.42 MB
          Luís Filipe Nassif
        5. screenshot-3.png
          334 kB
          Luís Filipe Nassif
        6. screenshot-2.png
          371 kB
          Luís Filipe Nassif
        7. screenshot-1.png
          362 kB
          Luís Filipe Nassif

        Issue Links

        Activity

          This comment will be Viewable by All Users Viewable by All Users
          Cancel

          People

            Unassigned Unassigned
            comcortim Tim Barrett
            Votes:
            0 Vote for this issue
            Watchers:
            6 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Slack

                Issue deployment