Uploaded image for project: 'Tika'
  1. Tika
  2. TIKA-2575

Provide a way to abort tika parses when tika input stream buffer grows passed a certain threshold

    Details

    • Type: Bug
    • Status: Open
    • Priority: Major
    • Resolution: Unresolved
    • Affects Version/s: None
    • Fix Version/s: None
    • Component/s: parser
    • Labels:
      None

      Description

      Sometimes, for example, you use tika to parse an XLS file that isn't really that big, maybe 60 MB. and suddenly the JVM heap size taken is >800Mb which causes an OOM in my case.

      Can we make an "abort threshold" where the tika parse will halt if parse output bytes exceeds this value?

      Or it is possible for users to already do this themselves by watching the input stream as it grows somehow?

       

       

        Attachments

        1. screenshot-1.png
          70 kB
          Nicholas DiPiazza

          Activity

            People

            • Assignee:
              Unassigned
              Reporter:
              ndipiazza_gmail Nicholas DiPiazza
            • Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

              • Created:
                Updated: