Uploaded image for project: 'Tika'
  1. Tika
  2. TIKA-2575

Provide a way to abort tika parses when tika input stream buffer grows passed a certain threshold

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Open
    • Major
    • Resolution: Unresolved
    • None
    • None
    • parser
    • None

    Description

      Sometimes, for example, you use tika to parse an XLS file that isn't really that big, maybe 60 MB. and suddenly the JVM heap size taken is >800Mb which causes an OOM in my case.

      Can we make an "abort threshold" where the tika parse will halt if parse output bytes exceeds this value?

      Or it is possible for users to already do this themselves by watching the input stream as it grows somehow?

       

       

      Attachments

        1. screenshot-1.png
          70 kB
          Nicholas DiPiazza

        Activity

          People

            Unassigned Unassigned
            ndipiazza_gmail Nicholas DiPiazza
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated: