Uploaded image for project: 'Tika'
  1. Tika
  2. TIKA-2473 PCX and DCX image support
  3. TIKA-2574

Extend PCX detection in tika-mimetypes.xml

    XMLWordPrintableJSON

Details

    • Sub-task
    • Status: Open
    • Major
    • Resolution: Unresolved
    • 1.17
    • None
    • detector
    • None

    Description

      The matcher for pcx should be reworked to avoid false-positives upon UTF-16LE and UTF-32LE textfiles.

      I suggest adding the filler from the header as mentioned in the original pcx specification

       

      <mime-type type="image/vnd.zbrush.pcx">
        <acronym>PCX</acronym>
        <_comment>ZSoft Paintbrush PiCture eXchange</_comment>
        <alias type="image/x-pcx"/>
        <alias type="image/x-pc-paintbrush"/>
        <magic priority="40">
        <match value="0x0A" type="string" offset="0">
          <!-- bytes 74 to 128 are blank to fill out 128 byte header. Set all bytes to 0 -->
          <!-- This has to be set to avoid false positives for text/plain;charset=UTF-16LE and text/plain;charset=UTF-32LE -->
          <match value="0x000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000" type="string" offset="74">
            <match value="0x00" type="string" offset="1"/>
            <match value="0x02" type="string" offset="1"/>
            <match value="0x03" type="string" offset="1"/>
            <match value="0x04" type="string" offset="1"/>
            <match value="0x05" type="string" offset="1"/>
          </match>
        </match>
      </magic>
      
      <glob pattern="*.pcx"/>
      
      </mime-type>
      

       
      I added some testfiles.

      gagravarr Can you please check this?

      Attachments

        1. IUC10-da.UTF-16LE.without-BOM
          1 kB
          Andreas Meier
        2. IUC10-da-Q.UTF-16LE.without-BOM
          1 kB
          Andreas Meier
        3. IUC10-da-Q.UTF-32LE.without-BOM
          2 kB
          Andreas Meier
        4. IUC10-it.UTF-16LE.without-BOM
          1 kB
          Andreas Meier
        5. Test_without_filehandle
          19 kB
          Andreas Meier
        6. Test.pcx
          19 kB
          Andreas Meier

        Issue Links

          Activity

            People

              Unassigned Unassigned
              AndreasMeier Andreas Meier
              Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

                Created:
                Updated: