Uploaded image for project: 'Tika'
  1. Tika
  2. TIKA-1120

Enable direct use of org.apache.tika.mime.MediaType.detect(...)

    XMLWordPrintableJSON

Details

    • Wish
    • Status: Closed
    • Minor
    • Resolution: Won't Fix
    • 1.3
    • None
    • mime
    • None

    Description

      When using mime type detection, the classes allow following use:

      try (InputStream is = theInputStream;
      BufferedInputStream bis = new BufferedInputStream(is)

      { MimeTypes mt = new MimeTypes(); Metadata md = new Metadata(); md.add(Metadata.RESOURCE_NAME_KEY, theFileName); MediaType mediaType = mt.detect(bis, null); return mediaType.toString(); }

      When debugging this, the MimeTypes class instantiates its internal patterns with an empty MediaTypeRegistry. Therefore, getDefaultMimeTypes() is never called and thus tika-mimetypes.xml never read.

      Is it possible to enable direct usage of MediaType.detect()? Like adding a new constructor, where the MediaTypeRegistry can be set?

      If not, the code comments (or the documentation at https://tika.apache.org/0.10/detection.html) should point out that MimeTypes() should not instantiated directly for mime type detection, but the detectors should be used. Possibly, a minimum example should be added to make the usage clear.

      Following example works here

      try (InputStream is = theInputStream;
      BufferedInputStream bis = new BufferedInputStream(is)

      { AutoDetectParser parser = new AutoDetectParser(); Detector detector = parser.getDetector(); Metadata md = new Metadata(); md.add(Metadata.RESOURCE_NAME_KEY, theFileName); MediaType mediaType = detector.detect(bis, md); return mediaType.toString(); }

      Attachments

        Activity

          People

            Unassigned Unassigned
            koppor Oliver Kopp
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: