Uploaded image for project: 'Tika'
  1. Tika
  2. TIKA-3811

Exclude NameDetector not working for Tika.detect(file)

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Open
    • Major
    • Resolution: Unresolved
    • 2.3.0
    • None
    • config, core, detector
    • None

    Description

      I need to detect mime type for a file but for security reason I want to exclude the detection by file name extension. 

      I added a tika-config_test.xml (see attached) to my unit test but it still detects file by name extension.

      I attached a test file that is wrongly detected as text/vtt because of the file extension, it should be text/plain in this case.

       

      The code of my unit test:

      File file = new File(getClass().getClassLoader().getResource("invalid_format.vtt").getFile());
      TikaConfig tikaConfig = new TikaConfig(this.getClass()
      .getClassLoader()
      .getResourceAsStream("tika-config_test.xml"));
       
      // returns text/vtt but should be text/plain
      String mimeType = new Tika(tikaConfig).detect(file); 
      

       

      Attachments

        1. invalid_format.vtt
          0.0 kB
          Giorgiana Ciobanu
        2. tika-config_test.xml
          0.4 kB
          Giorgiana Ciobanu

        Activity

          People

            Unassigned Unassigned
            giorgiana.ciobanu Giorgiana Ciobanu
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated: