Uploaded image for project: 'Tika'
  1. Tika
  2. TIKA-860

Make ZIP bomb detection configureable

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Resolved
    • Major
    • Resolution: Won't Fix
    • 1.0
    • None
    • parser
    • None

    Description

      The detection of ZIP bombs is nice and the original issue says it's configureable, but I found no solution how to change ParseContext of the AutoDetectParser to e.g. allow deeper nesting levels. The SecureContentHandler instantiation is hardcoded and there is no point of intervention.

      In my case a simple ZIP of an Eclipse project: http://store.pangaea.de/Publications/AltaweelM_2011/Salinization.zip triggered the bomb detection, but it is of course no bomb. Its just because the JAR/WAR files in this projects itself contain other JAR files and class files This overflows the nesting level of 10 - maybe even the TIKA OSGI bundle triggers the bomb detection (not tested).

      In my case I would like to raise the nesting level, but there is no solution. My change was to simply filter away JAR files (as they contain no metadata we are interested in our own development, we already removed e.g. CLASS file parsers from out TIKA config so we have a very simple parser structure only allowing pdf, office documents, txt files,...) by using a custom DocumentSelector in my ParseContext.

      Attachments

        Issue Links

          Activity

            People

              Unassigned Unassigned
              uschindler Uwe Schindler
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: