Details
Description
Tika currently detects dwf files as application/octect-stream.
To make Tika mime magic detector correctly recognize dwf files it should be added this code fragment in tika-mimetypes.xml registry:
<mime-type type="model/vnd.dwf"> <acronym>dwf</acronym> <_comment>Design Web Format</_comment> <magic priority="50"> <match type="string" offset="0" value="(DWF V"> <match type="string" offset="8" value="."> <match type="string" offset="11" value=")" /> </match> </match> </magic> <glob pattern="*.dwf" /> </mime-type>
In current version (DWF 6.0), dwf file is a ZIP-compressed container for vector-based CAD drawings. It is basically a ZIP archive with the (DWF V06.00) signature added before the regular ZIP magic number. For this reason, the match value to detect dwf files should be: (DWF V06.00)PK.
In the previous versions, the dwf data transport isn't a ZIP file format, so the magic number is only the (DWF V00.55) signature in the file header.
To make Tika detect dwf files with this version too I propose the match value in the code above.
Thanks,
Luca
P.S.: The DWF format specification is included in the DWF Toolkit. The DWF Toolkit is available for free at http://www.autodesk.com/dwftoolkit