Uploaded image for project: 'Tika'
  1. Tika
  2. TIKA-1823

Support detecting DWF format

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Resolved
    • Minor
    • Resolution: Fixed
    • None
    • 1.13
    • detector, mime

    Description

      Tika currently detects dwf files as application/octect-stream.
      To make Tika mime magic detector correctly recognize dwf files it should be added this code fragment in tika-mimetypes.xml registry:

      <mime-type type="model/vnd.dwf">
      	<acronym>dwf</acronym>
      	<_comment>Design Web Format</_comment>
      	<magic priority="50">
      		<match type="string" offset="0" value="(DWF V">
      			<match type="string" offset="8" value=".">
      				<match type="string" offset="11" value=")" />
      			</match>
      		</match>
      	</magic>
      	<glob pattern="*.dwf" />
      </mime-type>
      


      In current version (DWF 6.0), dwf file is a ZIP-compressed container for vector-based CAD drawings. It is basically a ZIP archive with the (DWF V06.00) signature added before the regular ZIP magic number. For this reason, the match value to detect dwf files should be: (DWF V06.00)PK.
      In the previous versions, the dwf data transport isn't a ZIP file format, so the magic number is only the (DWF V00.55) signature in the file header.
      To make Tika detect dwf files with this version too I propose the match value in the code above.

      Thanks,

      Luca


      P.S.: The DWF format specification is included in the DWF Toolkit. The DWF Toolkit is available for free at http://www.autodesk.com/dwftoolkit

      Attachments

        1. blocks_and_tables.dwf
          99 kB
          Luca Moretti

        Activity

          People

            Unassigned Unassigned
            lucamoretti88 Luca Moretti
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: