Details
-
Task
-
Status: Resolved
-
Trivial
-
Resolution: Fixed
-
None
-
None
-
None
Description
I recently compared the linux file command vs tika on a month of commoncrawl where Tika had initially identified 'application/octet'. When I query 'file' identified file types when Tika returned octet-stream, I get the list below for the top 20 most common.
I think it should be fairly straightforward (easy and precise) to add x-nes-rom, marc, and icc profile.
If anyone sees other file types that we would want to add, let me know.