Uploaded image for project: 'Tika'
  1. Tika
  2. TIKA-2576

Add application/zstd detection and parser

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Resolved
    • Minor
    • Resolution: Fixed
    • None
    • 1.18, 2.0.0
    • detector, parser
    • None

    Description

      The IETF is currently checking the specification of Zstandard compression and the application/zstd Media Type: https://tools.ietf.org/id/draft-kucherawy-dispatch-zstd-01.html

      As soon as the MediaType application/zstd is set as standard the Media Type shall be implemented.

      Possible mime-detection for tika-mimetypes.xml (second comment has to be changed when the standard is final):

        <mime-type type="application/zstd">
          <_comment>https://en.wikipedia.org/wiki/Zstandard</_comment>
          <_comment>https://tools.ietf.org/id/draft-kucherawy-dispatch-zstd-01.html</_comment>
          <magic priority="50">
            <match value="0xFD2FB528" type="little32" offset="0"/>
          </magic>
          <glob pattern="*.zstd"/>
        </mime-type>
      

      commons-compress version 1.16 and later provide a compressor and decompressor for the algorithm, based on com.github.luben zstd-jni https://github.com/luben/zstd-jni

      Attached sampe zstd file (huffman-compressed-larger) and the result after decompressing it.

      Decompression was done with commons-compress 1.16.1 and zstd-jni 1.3.3-3

      
      <dependency>
        <groupId>org.apache.commons</groupId>
        <artifactId>commons-compress</artifactId>
        <version>1.16.1</version>
      </dependency>
      <dependency>
        <groupId>com.github.luben</groupId>
        <artifactId>zstd-jni</artifactId>
        <version>1.3.3-3</version>
      </dependency>
      

      Regards

      Andreas

      Attachments

        1. huffman-compressed-larger
          0.1 kB
          Andreas Meier
        2. huffmann-compressed-larger-result.txt
          0.1 kB
          Andreas Meier

        Activity

          People

            Unassigned Unassigned
            AndreasMeier Andreas Meier
            Votes:
            1 Vote for this issue
            Watchers:
            7 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: