Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: 0.10
    • Fix Version/s: 0.10
    • Component/s: parser
    • Labels:
      None
    • Environment:

      Tested on OSX and Linux debian

      Description

      We have a thread which parser > 200k files, and we always get "too many open files open" error from operating system. Using lsof I noticed tha apache-tika temp files (created by class temporaryFiles) are not really deleted by operating system, even if delete method returns true.
      Searching in the code, I found that the problem (which does not manifest with all the files) is probably in TikaInputStream#close method. Here opencontainer is set to null, but in case of opencontainer instance of org.apache.poi.poifs.filesystem.NPOIFSFileSystem the problems disappear if I call close() on opencontainer. I modified the NPOIFSFileSystem class to implement java.io.Closeable, and modified TikaInputStream#close method to make

      if (openContainer instanceof java.io.Closeable)

      { ((java.io.Closeable) openContainer).close(); }

      openContainer = null;

      I don't know if this is the best solution, but it seems to solve the problem for me.

        Activity

        Enrico Donelli created issue -
        Hide
        Nick Burch added a comment -

        I've made NPOIFSFileSystem and OPCPackage closeable in r1100013. That'll be in POI 3.8 beta 3

        In r1100015 I've made TikaInputStream close the open container as you suggest, thanks for that. For now you'll need to use a nightly build (or your custom build) of POI to see the effect of that, but it'll kick in properly when 3.8 beta 3 is out.

        Show
        Nick Burch added a comment - I've made NPOIFSFileSystem and OPCPackage closeable in r1100013. That'll be in POI 3.8 beta 3 In r1100015 I've made TikaInputStream close the open container as you suggest, thanks for that. For now you'll need to use a nightly build (or your custom build) of POI to see the effect of that, but it'll kick in properly when 3.8 beta 3 is out.
        Nick Burch made changes -
        Field Original Value New Value
        Status Open [ 1 ] Resolved [ 5 ]
        Assignee Nick Burch [ gagravarr ]
        Fix Version/s 1.0 [ 12313535 ]
        Resolution Fixed [ 1 ]
        Jukka Zitting made changes -
        Status Resolved [ 5 ] Closed [ 6 ]

          People

          • Assignee:
            Nick Burch
            Reporter:
            Enrico Donelli
          • Votes:
            0 Vote for this issue
            Watchers:
            0 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development