[TIKA-2986] Edge case (?) in file type detection - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Improvement
Status: Open
Priority: Trivial
Resolution: Unresolved
Affects Version/s: None
Fix Version/s: None
Component/s: None
Labels:
None

Description

One of my colleagues, Philip Southam, recently came across a file that was identified as an Acrobat fdf file. The particular file was some kind of binary file with a ".fdf" extension, but not an Acrobat fdf.

Our current MimeTypes algorithm runs magic first, and then it tries to use the file extension. If the file extension suggests a child mime type of what was found via magic, that is used. The problem with this file was that the magic %FDF- was not found, so from the magic step, it was application/octet, and then the file extension, which was ".fdf", was selected because application/vnd.fdf is a child of application/octet.

If feels like we might want to add a rule that if a mime definition has a defined magic and that magic is not found, we should not then fall back to the file extension. Or, is there a better way to prevent this from happening? Or, is this just an edge case that we should ignore?

Attachments

Issue Links

is related to

TIKA-2988 Add mime for alternative fdf format

Resolved

Activity

People

Assignee:: Unassigned

Reporter:: Tim Allison

Votes:: 0 Vote for this issue

Watchers:: 3 Start watching this issue

Dates

Created:: 18/Nov/19 15:07

Updated:: 19/Nov/19 17:41