Uploaded image for project: 'Tika'
  1. Tika
  2. TIKA-3195

Inconsistent result of tika.detect(InputStream) and tika.detect(TikaInputStream)

    XMLWordPrintableJSON

    Details

    • Type: Bug
    • Status: Open
    • Priority: Major
    • Resolution: Unresolved
    • Affects Version/s: 1.24.1
    • Fix Version/s: None
    • Component/s: detector
    • Labels:
      None

      Description

      When we tried to detect ogg video, samples can be found from  https://filesamples.com/formats/ogv

      We noticed that tika will return different result when detect:

      Tika tika = new Tika();
      try(InputStream inputStream = Main.class.getResourceAsStream("/sample_1280x720.ogv");
          TikaInputStream tikaInputStream = TikaInputStream.get(inputStream)) {
          String mimeType1 = tika.detect(tikaInputStream);
          System.out.println(mimeType1);
      }
      # output: video/theora
      
      Path path = Paths.get(Main.class.getResource("/sample_1280x720.ogv").toURI());
      String mimeType2 = tika.detect(path);
      System.out.println(mimeType2);
      # output: video/theora 
      
      try(InputStream inputStream = Main.class.getResourceAsStream("/sample_1280x720.ogv")) {
          String mimeType3 = tika.detect(inputStream);
          System.out.println(mimeType3);
      }
      # output: application/ogg

      The result which takes in the inputStream is different from others. 

      Is this the expected behavior?

       

        Attachments

          Activity

            People

            • Assignee:
              Unassigned
              Reporter:
              xj xiaojie
            • Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

              • Created:
                Updated: