Uploaded image for project: 'Tika'
  1. Tika
  2. TIKA-3992 Add common missing mimes based on Common Crawl data
  3. TIKA-4054

Add various file identifications to reduce application/octet-stream

    XMLWordPrintableJSON

Details

    • Sub-task
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • None
    • 2.9.0
    • None
    • None

    Description

      Catch all task for various format identification data which are currently being identified as application/octet-stream. Most data is from PRONOM.

       

      SPSS Data File

      application/x-spss-sav

      External signatures File extension: sav
      Internal signatures
      Name SPSS Data File
      Description BOF: $FL2@(#)
      Byte sequences
      Position type Absolute from BOF
      Offset 0
      Maximum Offset 0
      Byte order  
      Value 24464C3240282329

       

      Amiga Disk File

      application/x-amiga-disk-format

      External signatures File extension: adf
      Internal signatures
      Name Amiga Disk File
      Description BOF: ‘DOS’ followed by ‘00|01|02|03|04|05|06|07’ depending on the format of the disk. More information on the internal signature can be found here: http://lclevy.free.fr/adflib/adf_info.html#p41
      Byte sequences
      Position type Absolute from BOF
      Offset 0
      Maximum Offset 0
      Byte order  
      Value 444F53(00|01|02|03|04|05|06|07)

       

      JEOL NMR Spectroscopy

      chemical/x-jeol-jdf

      External signatures File extension: jdf
      Internal signatures  
      Name JDF NMR Spectroscopy big endian
      Description Big Endian: BOF: 4A454F4C2E4E4D52 (JEOL.NMR)
      Byte sequences

       

      Position type Absolute from BOF
      Offset 0
      Maximum Offset 0
      Byte order  
      Value 4A454F4C2E4E4D52
         
      Name JDF little endian
      Description Little Endian: 524D4E2E4C4F454A (RMN.LOEJ)
      Byte sequences  
      Position type Absolute from BOF
      Offset 0
      Maximum Offset 0
      Byte order  
      Value 524D4E2E4C4F454A

       

      ASPRS Lidar Data Exchange Format

      no mimetype found

      External signatures File extension: las
      File extension: laz
      Internal signatures
      Name ASPRS Lidar Data Exchange Format 1.2
      Description ASCII header: LASF, followed after 20 bytes by version number 1.2
      Byte sequences
      Position type Absolute from BOF
      Offset 0
      Byte order  
      Value 4C415346{20}0102{78}[00:99]

       

      ASPRS Lidar Data Exchange Format v1.1

      no mimetype found

      External signatures File extension: las
      File extension: laz
      Internal signatures
      Name ASPRS Lidar Data Exchange Format 1.1
      Description ASCII header: LASF, followed after 20 bytes by version number 1.1
      Byte sequences
      Position type Absolute from BOF
      Offset 0
      Byte order  
      Value 4C415346{20}0101{78}[00:99]

       

      3D Studio

      image/x-3ds

      External signatures File extension: 3ds
      Internal signatures
      Name 3D Studio (V1)
      Description Primary chunk ID, chunk length, version subchunk ID, chunk length, version, 3D-editor chunk ID.
      Byte sequences
      Position type Absolute from BOF
      Offset 0
      Byte order Little-endian
      Value 4D4D{4}02000A000000(03|04){3}3D3D
      Name 3D Studio (V2)
      Description Primary chunk ID, chunk length, 3D-editor chunk ID
      Byte sequences
      Position type Absolute from BOF
      Offset 0
      Maximum Offset 0
      Byte order  
      Value 4D4D{4}3D3D

       

      TAP (ZX Spectrum)

      application/x-spectrum-tzx

      External signatures File extension: tap
      Internal signatures
      Name TAPZX
      Description …{20}ÿ
      Byte sequences
      Position type Absolute from BOF
      Offset 0
      Maximum Offset 0
      Byte order  
      Value 130000{20}FF

       

      Sibelius

      no mimetype found

      External signatures File extension: sib
      Internal signatures
      Name Sibelius
      Description Absolute from beginning of file, magic bytes: .SIBELIUS
      Byte sequences
      Position type Absolute from BOF
      Offset 0
      Maximum Offset 0
      Byte order  
      Value 0F534942454C495553

       

      Portable Sound Format

      no mimetype found

      External signatures File extension: psf
      File extension: psf1
      File extension: psflib
      File extension: minipsf
      File extension: minipsf1
      File extension: gsf
      File extension: gsflib
      File extension: minigsf
      Internal signatures
      Name Portable Sound Format
      Description BOF: PSFx, where x represents one of the following values for which PSF has been adapted 4th byte: 0x01: Playstation (PSF1) 0x02: Playstation 2 (PSF2) 0x11: Sega Saturn (SSF) 0x12: Sega Dreamcast (DSF) 0x13: Sega Genesis 0x21: Nintendo 64 (USF) 0x22: GameBoy Advance (GSF) 0x23: Super NES (SNSF) 0x41: Capcom QSound (QSF) Format description: http://web.archive.org/web/20140125155137/http://wiki.neillcorlett.com/PSFFormat
      Byte sequences
      Position type Absolute from BOF
      Offset 0
      Maximum Offset 0
      Byte order  
      Value 505346(01|02|11|12|13|21|22|23|41)

      Attachments

        Activity

          People

            Unassigned Unassigned
            greg@rhobard.com Gregory Lepore
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: