Uploaded image for project: 'Tika'
  1. Tika
  2. TIKA-2450

OfficeParser.parse called for zero-byte file with .doc extension

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Minor
    • Resolution: Fixed
    • Affects Version/s: 1.16
    • Fix Version/s: 1.17
    • Component/s: detector, parser
    • Labels:
      None

      Description

      A zero-byte (empty) file with a .doc extension is detected as a Word Document and the OfficeParser.parse method is called for this file.

      We then get a TikaException, with the cause given as an org.apache.poi.EmptyFileException.

      I think it would be more useful if the file were NOT detected as a Word Document, meaning that the AutoDetectParser would then fall back to whatever is set as the fallback parser in the parse context.

      This is more useful because the user can then trigger some special logic for handling empty files.

        Attachments

          Issue Links

            Activity

              People

              • Assignee:
                Unassigned
                Reporter:
                mcaruanagalizia Matthew Caruana Galizia
              • Votes:
                0 Vote for this issue
                Watchers:
                5 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: