Uploaded image for project: 'Tika'
  1. Tika
  2. TIKA-4172

Apple binary file incorrectly identified as text/x-sql due to filename

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Closed
    • Minor
    • Resolution: Not A Bug
    • 2.9.1
    • None
    • general
    • None

    Description

      This is related to https://github.com/eikek/docspell/issues/2376 and https://github.com/eikek/docspell/issues/2403.

      Take the following Base64 encoding of a binary Apple-generated file. No idea what it does. You can get the file by piping the following to e.g. base64 -d > something.sql

      ABRkMDEwMWM2Nl9teVNRTDQwLnNxbAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
      AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAbUJJTgAA
      AAAAAAAAAAAAAAAAAACCgf+/AAA=
      

      If this file is name something.sql, then Tika will classify it as text/x-sql, which it is not. It seems like more weight is given to the filename (extension) than the fact that the file is binary anyway.

      Attachments

        Activity

          People

            Unassigned Unassigned
            madduck martin k.
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: