Tika
  1. Tika
  2. TIKA-447

Container aware mimetype detection

    Details

    • Type: New Feature New Feature
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: 0.7
    • Fix Version/s: 0.10
    • Component/s: mime
    • Labels:
      None

      Description

      As discussed on the dev list, Tika should ideally have a configurable way to process container based formats (eg zip files and ole2 files) when trying to detect the correct mime type for a document.

      This needs to be configurable, because some people won't want Tika to have to do all the work of parsing the whole file when they're not interested in knowing exactly what's in it

      Once we have gone to the trouble of opening and parsing the container file, we should try to keep the open container around to speed up parsing of the contents

      1. TikaContainerDetection.patch
        16 kB
        Nick Burch
      2. TIKA-447-TikaInputStream.patch
        6 kB
        Jukka Zitting

        Activity

          People

          • Assignee:
            Unassigned
            Reporter:
            Nick Burch
          • Votes:
            0 Vote for this issue
            Watchers:
            0 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development