Uploaded image for project: 'Tika'
  1. Tika
  2. TIKA-3976

Allow users to configure behavior for zero-byte files

    XMLWordPrintableJSON

Details

    • Task
    • Status: Resolved
    • Minor
    • Resolution: Fixed
    • None
    • 2.8.0
    • None
    • None

    Description

      We currently throw a ZeroByteFileException whenever the stream is empty in AutoDetectParser.

      I think the reason we did this was for use cases in search systems, where it would be exceptional to send in a zero-byte file.

      For other use cases, though, especially for embedded files, it is kind of normal to have zero-byte contents but have meaningful metadata.

      So, embedded files generally are one place (as in .ppt, etc.), but WARC redirects and HTTPResponse files would be other types of containers that may include meaningful metadata in the embedded file, but the embedded file has a zero-byte stream.

      Attachments

        Issue Links

          Activity

            People

              Unassigned Unassigned
              tallison Tim Allison
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: