Uploaded image for project: 'Tika'
  1. Tika
  2. TIKA-860

Make ZIP bomb detection configureable

    XMLWordPrintableJSON

    Details

    • Type: Improvement
    • Status: Resolved
    • Priority: Major
    • Resolution: Won't Fix
    • Affects Version/s: 1.0
    • Fix Version/s: None
    • Component/s: parser
    • Labels:
      None

      Description

      The detection of ZIP bombs is nice and the original issue says it's configureable, but I found no solution how to change ParseContext of the AutoDetectParser to e.g. allow deeper nesting levels. The SecureContentHandler instantiation is hardcoded and there is no point of intervention.

      In my case a simple ZIP of an Eclipse project: http://store.pangaea.de/Publications/AltaweelM_2011/Salinization.zip triggered the bomb detection, but it is of course no bomb. Its just because the JAR/WAR files in this projects itself contain other JAR files and class files This overflows the nesting level of 10 - maybe even the TIKA OSGI bundle triggers the bomb detection (not tested).

      In my case I would like to raise the nesting level, but there is no solution. My change was to simply filter away JAR files (as they contain no metadata we are interested in our own development, we already removed e.g. CLASS file parsers from out TIKA config so we have a very simple parser structure only allowing pdf, office documents, txt files,...) by using a custom DocumentSelector in my ParseContext.

        Attachments

          Issue Links

            Activity

              People

              • Assignee:
                Unassigned
                Reporter:
                uschindler Uwe Schindler
              • Votes:
                0 Vote for this issue
                Watchers:
                2 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: