Uploaded image for project: 'Tika'
  1. Tika
  2. TIKA-1823

Support detecting DWF format

    XMLWordPrintableJSON

    Details

    • Type: Improvement
    • Status: Resolved
    • Priority: Minor
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 1.13
    • Component/s: detector, mime
    • Labels:

      Description

      Tika currently detects dwf files as application/octect-stream.
      To make Tika mime magic detector correctly recognize dwf files it should be added this code fragment in tika-mimetypes.xml registry:

      <mime-type type="model/vnd.dwf">
      	<acronym>dwf</acronym>
      	<_comment>Design Web Format</_comment>
      	<magic priority="50">
      		<match type="string" offset="0" value="(DWF V">
      			<match type="string" offset="8" value=".">
      				<match type="string" offset="11" value=")" />
      			</match>
      		</match>
      	</magic>
      	<glob pattern="*.dwf" />
      </mime-type>
      


      In current version (DWF 6.0), dwf file is a ZIP-compressed container for vector-based CAD drawings. It is basically a ZIP archive with the (DWF V06.00) signature added before the regular ZIP magic number. For this reason, the match value to detect dwf files should be: (DWF V06.00)PK.
      In the previous versions, the dwf data transport isn't a ZIP file format, so the magic number is only the (DWF V00.55) signature in the file header.
      To make Tika detect dwf files with this version too I propose the match value in the code above.

      Thanks,

      Luca


      P.S.: The DWF format specification is included in the DWF Toolkit. The DWF Toolkit is available for free at http://www.autodesk.com/dwftoolkit

        Attachments

        1. blocks_and_tables.dwf
          99 kB
          Luca Moretti

          Activity

            People

            • Assignee:
              Unassigned
              Reporter:
              lucamoretti88 Luca Moretti
            • Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: