Uploaded image for project: 'Tika'
  1. Tika
  2. TIKA-2430

Add at least dev test capability to run Tika against fuzzed files

    Details

    • Type: Improvement
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 1.17
    • Component/s: None
    • Labels:
      None

      Description

      Luis Filipe Nassif observed on TIKA-2428 that a corrupt file caused a permanent hang for the EMFParser. Files can be corrupted for various reasons. We can add some optional code to let people experiment with running Tika against randomly corrupted versions of the files in our test suite. I suspect that this will unearth too many errors to start to be run on a regular basis.

      Let's at least add some code in tika-parsers to let devs run the tests.

        Activity

        Show
        tallison@mitre.org Tim Allison added a comment - https://bz.apache.org/bugzilla/show_bug.cgi?id=61295 https://bz.apache.org/bugzilla/show_bug.cgi?id=61300
        Hide
        tallison@mitre.org Tim Allison added a comment -

        Luis Filipe Nassif, there are two options now: randomly truncate a file, and randomly choose bytes to overwrite with random bytes. If there's a more common pattern you see...randomly write a block length chunk in a file, please re-open this issue.

        This has already revealed two areas for improvement in POI with just one test file. I wasn't able to reproduce the EMF bug on the one test file I used, yet...

        Show
        tallison@mitre.org Tim Allison added a comment - Luis Filipe Nassif , there are two options now: randomly truncate a file, and randomly choose bytes to overwrite with random bytes. If there's a more common pattern you see...randomly write a block length chunk in a file, please re-open this issue. This has already revealed two areas for improvement in POI with just one test file. I wasn't able to reproduce the EMF bug on the one test file I used, yet...
        Hide
        hudson Hudson added a comment -

        FAILURE: Integrated in Jenkins build Tika-trunk #1332 (See https://builds.apache.org/job/Tika-trunk/1332/)
        TIKA-2430 – add a capability to allow devs to easily run parsers (tallison: https://github.com/apache/tika/commit/9869851d67744bc555914c5c447eb66e155c1c2c)

        • (add) tika-parsers/src/test/java/org/apache/tika/TestCorruptedFiles.java
        • (edit) CHANGES.txt
        Show
        hudson Hudson added a comment - FAILURE: Integrated in Jenkins build Tika-trunk #1332 (See https://builds.apache.org/job/Tika-trunk/1332/ ) TIKA-2430 – add a capability to allow devs to easily run parsers (tallison: https://github.com/apache/tika/commit/9869851d67744bc555914c5c447eb66e155c1c2c ) (add) tika-parsers/src/test/java/org/apache/tika/TestCorruptedFiles.java (edit) CHANGES.txt
        Hide
        lfcnassif Luis Filipe Nassif added a comment -

        Awesome Tim Allison, you rocks! For sure randomly overwriting some sequential block chunk in a file is a very common use case in forensic field, where deleted files can be partially overwritten by the OS. I will leave this closed and we can improve later.

        Show
        lfcnassif Luis Filipe Nassif added a comment - Awesome Tim Allison , you rocks! For sure randomly overwriting some sequential block chunk in a file is a very common use case in forensic field, where deleted files can be partially overwritten by the OS. I will leave this closed and we can improve later.
        Hide
        tallison@mitre.org Tim Allison added a comment -

        I was able to reproduce TIKA-2428 with the latest commit that extracts embedded files and then fuzzes those individually.

        Show
        tallison@mitre.org Tim Allison added a comment - I was able to reproduce TIKA-2428 with the latest commit that extracts embedded files and then fuzzes those individually.

          People

          • Assignee:
            tallison@mitre.org Tim Allison
            Reporter:
            tallison@mitre.org Tim Allison
          • Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development