Tika
  1. Tika
  2. TIKA-886

OOXMLExtractorFactory can leave files open

    Details

    • Type: Bug Bug
    • Status: Resolved
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: 1.1
    • Fix Version/s: 1.2
    • Component/s: parser
    • Labels:
      None

      Description

      As identified in an Alfresco bug (ALF-13106), OOXMLExtractorFactory doesn't currently allow the closing of OPCPackage instances created from Files. This is because the OPCPackage isn't associated with the TikaInputStream, so the close doesn't propogate

        Activity

        Hide
        Nick Burch added a comment -

        Changed in r1306411, the two cases of ZipContainerDetector and OOXMLExtractorFactory now behave the same, and the OPCPackage will be closed (+ release its zip resources) when the TikaInputStream is closed

        Show
        Nick Burch added a comment - Changed in r1306411, the two cases of ZipContainerDetector and OOXMLExtractorFactory now behave the same, and the OPCPackage will be closed (+ release its zip resources) when the TikaInputStream is closed
        Hide
        Nick Burch added a comment -

        For cases where the OPCPackage is opened in ZipContainerDetector, then the OPCPackage is added as the OpenContainer on the TikaInputStream and is closed correctly when the stream is closed. For cases where OOXMLExtractorFactory does the open, it should likewise set it as the container so it is closed. For pure stream based creation, there is no state left to close, this only affects the case of opening the OPCPackage from a File.

        (Having OOXMLExtractorFactory do the close itself feels wrong, as it might end up closing something that someone else opened, and having it track who opened it with different code paths doesn't feel right. Making the two cases behave the same feels simplest)

        Show
        Nick Burch added a comment - For cases where the OPCPackage is opened in ZipContainerDetector, then the OPCPackage is added as the OpenContainer on the TikaInputStream and is closed correctly when the stream is closed. For cases where OOXMLExtractorFactory does the open, it should likewise set it as the container so it is closed. For pure stream based creation, there is no state left to close, this only affects the case of opening the OPCPackage from a File. (Having OOXMLExtractorFactory do the close itself feels wrong, as it might end up closing something that someone else opened, and having it track who opened it with different code paths doesn't feel right. Making the two cases behave the same feels simplest)

          People

          • Assignee:
            Nick Burch
            Reporter:
            Nick Burch
          • Votes:
            0 Vote for this issue
            Watchers:
            0 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development