Uploaded image for project: 'Tika'
  1. Tika
  2. TIKA-3810

Vtt file (encoding UTF-8 with BOM) seen as text/plain

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • 2.3.0
    • 2.5.0
    • core, detector, mime
    • None

    Description

      Vtt file created on Windows (UTF-8 with BOM) is incorrectly detected as text/plain type and it should be text/vtt .

      The application using Tika and where the file is uploaded for mime type detection is an Unix machine. 

      The vtt file is passed as inputstream to the Tika's default detector (we don't want to detect mime type by the file extension).

      Please find attached the vtt file that Tika is detecting as text/plain .

      Attachments

        1. s5_windowEncoding_validFormat.vtt
          0.7 kB
          Giorgiana Ciobanu

        Activity

          People

            Unassigned Unassigned
            giorgiana.ciobanu Giorgiana Ciobanu
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: