Uploaded image for project: 'Tika'
  1. Tika
  2. TIKA-2576

Add application/zstd detection and parser

    Details

    • Type: Improvement
    • Status: Resolved
    • Priority: Minor
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 1.18, 2.0.0
    • Component/s: detector, parser
    • Labels:
      None

      Description

      The IETF is currently checking the specification of Zstandard compression and the application/zstd Media Type: https://tools.ietf.org/id/draft-kucherawy-dispatch-zstd-01.html

      As soon as the MediaType application/zstd is set as standard the Media Type shall be implemented.

      Possible mime-detection for tika-mimetypes.xml (second comment has to be changed when the standard is final):

        <mime-type type="application/zstd">
          <_comment>https://en.wikipedia.org/wiki/Zstandard</_comment>
          <_comment>https://tools.ietf.org/id/draft-kucherawy-dispatch-zstd-01.html</_comment>
          <magic priority="50">
            <match value="0xFD2FB528" type="little32" offset="0"/>
          </magic>
          <glob pattern="*.zstd"/>
        </mime-type>
      

      commons-compress version 1.16 and later provide a compressor and decompressor for the algorithm, based on com.github.luben zstd-jni https://github.com/luben/zstd-jni

      Attached sampe zstd file (huffman-compressed-larger) and the result after decompressing it.

      Decompression was done with commons-compress 1.16.1 and zstd-jni 1.3.3-3

      
      <dependency>
        <groupId>org.apache.commons</groupId>
        <artifactId>commons-compress</artifactId>
        <version>1.16.1</version>
      </dependency>
      <dependency>
        <groupId>com.github.luben</groupId>
        <artifactId>zstd-jni</artifactId>
        <version>1.3.3-3</version>
      </dependency>
      

      Regards

      Andreas

        Attachments

        1. huffman-compressed-larger
          0.1 kB
          Andreas Meier
        2. huffmann-compressed-larger-result.txt
          0.1 kB
          Andreas Meier

          Activity

            People

            • Assignee:
              Unassigned
              Reporter:
              AndreasMeier Andreas Meier
            • Votes:
              1 Vote for this issue
              Watchers:
              7 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: