Uploaded image for project: 'Tika'
  1. Tika
  2. TIKA-790

Reduce duplication between POIFSDocumentType (in OfficeParser) and POIFSContainerDetector

Attach filesAttach ScreenshotVotersWatch issueWatchersCreate sub-taskLinkCloneUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • 1.0
    • 1.1
    • parser
    • None

    Description

      For historical reasons, we now have two parts of Tika that handle trying to identify the type of an OLE2 based file.

      POIFSDocumentType is able to detect a few kinds of files that POIFSContainerDetector is not able to (eg Encrypted and OLE Native), mostly which may not map well onto mimetypes. POIFSDocumentType also lacks some of the logic in the main detector, and only does the office parser supported files

      We should probably try to reduce the duplication. One option is to add the extra few types into the Detector some how, the other is to use the detector first and do additional specific checks after

      Attachments

        Activity

          This comment will be Viewable by All Users Viewable by All Users
          Cancel

          People

            nick Nick Burch
            nick Nick Burch
            Votes:
            0 Vote for this issue
            Watchers:
            0 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Slack

                Issue deployment