Uploaded image for project: 'Tika'
  1. Tika
  2. TIKA-2450

OfficeParser.parse called for zero-byte file with .doc extension

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Minor
    • Resolution: Fixed
    • 1.16
    • 1.17
    • detector, parser
    • None

    Description

      A zero-byte (empty) file with a .doc extension is detected as a Word Document and the OfficeParser.parse method is called for this file.

      We then get a TikaException, with the cause given as an org.apache.poi.EmptyFileException.

      I think it would be more useful if the file were NOT detected as a Word Document, meaning that the AutoDetectParser would then fall back to whatever is set as the fallback parser in the parse context.

      This is more useful because the user can then trigger some special logic for handling empty files.

      Attachments

        Issue Links

          Activity

            People

              Unassigned Unassigned
              mcaruanagalizia Matthew Caruana Galizia
              Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: