Tika
  1. Tika
  2. TIKA-853

java.io.IOException with TikaGUI and testMP4.m4a

    Details

    • Type: Bug Bug
    • Status: Resolved
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: 1.1
    • Fix Version/s: 1.2
    • Component/s: gui, parser
    • Labels:
      None
    • Environment:

      Windows 7

      Description

      Using a latest build: when attempting to drop the new testMP4.m4a file into the Tika GUI, a TikaException / IOException occurs:
      org.apache.tika.exception.TikaException: Failed to close temporary resources
      at org.apache.tika.io.TemporaryResources.dispose(TemporaryResources.java:152)
      at org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:127)
      at org.apache.tika.gui.TikaGUI.handleStream(TikaGUI.java:320)
      at org.apache.tika.gui.TikaGUI.openFile(TikaGUI.java:279)
      at org.apache.tika.gui.ParsingTransferHandler.importFiles(ParsingTransferHandler.java:94)
      at org.apache.tika.gui.ParsingTransferHandler.importData(ParsingTransferHandler.java:77)
      at javax.swing.TransferHandler.importData(Unknown Source)
      at javax.swing.TransferHandler$DropHandler.drop(Unknown Source)
      ....
      Caused by: java.io.IOException: Could not delete temporary file C:\Users\john\AppData\Local\Temp\apache-tika-693752014807275949.tmp
      at org.apache.tika.io.TemporaryResources$1.close(TemporaryResources.java:70)
      at org.apache.tika.io.TemporaryResources.close(TemporaryResources.java:121)
      at org.apache.tika.io.TemporaryResources.dispose(TemporaryResources.java:150)
      ... 40 more

      I know that the parser for this file is new and its external source parser has some potential bugs, but this exception does not occur when using Tika CLI to detect / parse the test file.

      1. TIKA-853.patch
        2 kB
        John Mastarone

        Activity

        Hide
        Nick Burch added a comment -

        Ah, we weren't closing the stream in all cases. This is hopefully fixed in r1237614, could you see if it's solved for you? (It's a windows only problem so I can't check)

        Show
        Nick Burch added a comment - Ah, we weren't closing the stream in all cases. This is hopefully fixed in r1237614, could you see if it's solved for you? (It's a windows only problem so I can't check)
        Hide
        John Mastarone added a comment -

        I updated and tried again; still no luck. The CLI is still fine. The exact stack trace was
        Caused by: java.io.IOException: Could not delete temporary file C:\Users\john\AppData\Local\Temp\apache-tika-4583375592196413071.tmp
        at org.apache.tika.io.TemporaryResources$1.close(TemporaryResources.java:70)
        at org.apache.tika.io.TemporaryResources.close(TemporaryResources.java:121)
        at org.apache.tika.io.TikaInputStream.close(TikaInputStream.java:637)
        at org.apache.tika.parser.mp4.MP4Parser.parse(MP4Parser.java:123)
        at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:242)
        ... 41 more

        Show
        John Mastarone added a comment - I updated and tried again; still no luck. The CLI is still fine. The exact stack trace was Caused by: java.io.IOException: Could not delete temporary file C:\Users\john\AppData\Local\Temp\apache-tika-4583375592196413071.tmp at org.apache.tika.io.TemporaryResources$1.close(TemporaryResources.java:70) at org.apache.tika.io.TemporaryResources.close(TemporaryResources.java:121) at org.apache.tika.io.TikaInputStream.close(TikaInputStream.java:637) at org.apache.tika.parser.mp4.MP4Parser.parse(MP4Parser.java:123) at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:242) ... 41 more
        Hide
        Nick Burch added a comment -

        I've looked at the code again, and I can't spot anything obviously wrong. I can't reproduce it either, as I'm on Linux

        Are you able to use a debugger to track down what's holding the file open?

        Show
        Nick Burch added a comment - I've looked at the code again, and I can't spot anything obviously wrong. I can't reproduce it either, as I'm on Linux Are you able to use a debugger to track down what's holding the file open?
        Hide
        John Mastarone added a comment -

        I tried debugging but I couldn't see what was holding it open, unless I'm missing something. There were three items in the TemporaryResources class's resources linked-list, and the Closeable representing the temporary file was the last of the three in the list to be closed; the other two closed successfully. I also tried looking on stackoverflow for answers, and as a result tried calling System.gc() and tried explicitly making sure the file was writable, but this didn't help. Maybe it's just a problem with Windows, or maybe if I tried upgrading to JDK 7 the problem would vanish. I can attempt the latter fairly soon and say if that helped.

        Show
        John Mastarone added a comment - I tried debugging but I couldn't see what was holding it open, unless I'm missing something. There were three items in the TemporaryResources class's resources linked-list, and the Closeable representing the temporary file was the last of the three in the list to be closed; the other two closed successfully. I also tried looking on stackoverflow for answers, and as a result tried calling System.gc() and tried explicitly making sure the file was writable, but this didn't help. Maybe it's just a problem with Windows, or maybe if I tried upgrading to JDK 7 the problem would vanish. I can attempt the latter fairly soon and say if that helped.
        Hide
        Nick Burch added a comment -

        It's a Windows thing, because Windows won't let you delete an open file, while other platforms are happy to

        I think the issue is that something is opening the file and not closing it. It'll be outside of the TemporaryResources though, which is what'll make it hard to track down (I've read the code flow through several times and I can't spot it...)

        Show
        Nick Burch added a comment - It's a Windows thing, because Windows won't let you delete an open file, while other platforms are happy to I think the issue is that something is opening the file and not closing it. It'll be outside of the TemporaryResources though, which is what'll make it hard to track down (I've read the code flow through several times and I can't spot it...)
        Hide
        Jukka Zitting added a comment -

        Do you have a virus scanner running? I've seen quite a few cases where a virus scanner or some other similar file scanning tool (desktop search, etc.) prevents files from being removed. If that's the case here, I guess it's best to open a separate issue for the problem.

        We might be able to work around that for example by detecting a case where a temporary file can't be removed for such a reason, and retrying the delete after a small delay or using File.deleteOnExit().

        Show
        Jukka Zitting added a comment - Do you have a virus scanner running? I've seen quite a few cases where a virus scanner or some other similar file scanning tool (desktop search, etc.) prevents files from being removed. If that's the case here, I guess it's best to open a separate issue for the problem. We might be able to work around that for example by detecting a case where a temporary file can't be removed for such a reason, and retrying the delete after a small delay or using File.deleteOnExit().
        Hide
        John Mastarone added a comment -

        I had added my temporary folder and tika project folder to my virus scanner's exclusion list; it didn't help. Inside the Tika MP4Parser class's parse method, I added a catch block that would print out any exceptions caught. I removed the two lines:
        isoFile = new IsoFile(isoBufferWrapper);
        isoFile.parse();
        and still had my same exception. I then removed the preceding line:
        IsoBufferWrapper isoBufferWrapper = new IsoBufferWrapperImpl( tstream.getFile());
        and replaced it with just "tstream.getFile();". Then, when parsing, I no longer had my exception caused by tstream.close() in the finally block---I had another one later on in the method, related to the fact that I'd removed those lines concerning the external parser classes. At no time did I ever catch any exception from these external parser classes with my added catch block. The IsoBufferWrapperImpl itself creates an internal RandomAccessFile, but it should close it at the end of its constructor, or else throw an exception that I should be seeing. But anyway, without yet stepping through that external source code while debugging to see if something obvious is going wrong, just the act of creating that IsoBufferWrapperImpl prevents the temporary file from being deleted, I think. I don't see any methods on the isoFile or the isoBufferWrapper objects that might help release something that needs to be released.

        Show
        John Mastarone added a comment - I had added my temporary folder and tika project folder to my virus scanner's exclusion list; it didn't help. Inside the Tika MP4Parser class's parse method, I added a catch block that would print out any exceptions caught. I removed the two lines: isoFile = new IsoFile(isoBufferWrapper); isoFile.parse(); and still had my same exception. I then removed the preceding line: IsoBufferWrapper isoBufferWrapper = new IsoBufferWrapperImpl( tstream.getFile()); and replaced it with just "tstream.getFile();". Then, when parsing, I no longer had my exception caused by tstream.close() in the finally block---I had another one later on in the method, related to the fact that I'd removed those lines concerning the external parser classes. At no time did I ever catch any exception from these external parser classes with my added catch block. The IsoBufferWrapperImpl itself creates an internal RandomAccessFile, but it should close it at the end of its constructor, or else throw an exception that I should be seeing. But anyway, without yet stepping through that external source code while debugging to see if something obvious is going wrong, just the act of creating that IsoBufferWrapperImpl prevents the temporary file from being deleted, I think. I don't see any methods on the isoFile or the isoBufferWrapper objects that might help release something that needs to be released.
        Hide
        John Mastarone added a comment -

        Within a day or two I plan on extending the IsoBufferWrapperImpl (if not get the external mp4parser's source code and modify / rebuild it myself), just to have a different constructor that calls getChannel() only once on the RandomAccessFile, and explicitly closes the resulting FileChannel at the end of the constructor. If this solves it, I'll file a bug with the mp4parser project.

        Show
        John Mastarone added a comment - Within a day or two I plan on extending the IsoBufferWrapperImpl (if not get the external mp4parser's source code and modify / rebuild it myself), just to have a different constructor that calls getChannel() only once on the RandomAccessFile, and explicitly closes the resulting FileChannel at the end of the constructor. If this solves it, I'll file a bug with the mp4parser project.
        Hide
        John Mastarone added a comment -

        I've attached a potential patch for the MP4Parser class that prevents the occurrence of the IOException. The things I speculated about in my previous two comments concerning the external parser were incorrect. In the MP4Parser's parse method, I moved the closing of the TikaInputStream to the very end, and before that, I set a number of object references to null, and then call System.gc(). If I set one fewer object reference to null from the collection of those that I included, or if I don't garbage collect, my IOException remains.

        Show
        John Mastarone added a comment - I've attached a potential patch for the MP4Parser class that prevents the occurrence of the IOException. The things I speculated about in my previous two comments concerning the external parser were incorrect. In the MP4Parser's parse method, I moved the closing of the TikaInputStream to the very end, and before that, I set a number of object references to null, and then call System.gc(). If I set one fewer object reference to null from the collection of those that I included, or if I don't garbage collect, my IOException remains.
        Hide
        Nick Burch added a comment -

        We don't want to have a System.gc call in production code!

        It sounds like there is some sort of implicit resource leak from the library, which is cleaned up by nulling and GCing. Are you able to replicate the problem with only calls to the MP4Parser (library) code? That could potentially be used in a bug report

        Show
        Nick Burch added a comment - We don't want to have a System.gc call in production code! It sounds like there is some sort of implicit resource leak from the library, which is cleaned up by nulling and GCing. Are you able to replicate the problem with only calls to the MP4Parser (library) code? That could potentially be used in a bug report
        Hide
        John Mastarone added a comment -

        I think I have a better fix for this that does not involve the previous steps taken. Inside Tika's MP4Parser parse method, I call tstream.getFile(), and assign the result to a File that is used to construct a FileInputStream. I read the stream's bytes into an array, and pass this byte array as an argument to a constructor for the IsoBufferWrapperImpl. Then after calling the IsoFile's parse method, tstream.close() does not fail.
        I'm not sure that using the current IsoBufferWrapperImpl constructor, which takes a File argument, would ever allow for the TikaInputStream to be closed while still inside the MP4Parser's parse method without producing the IOException in Windows, because this constructor creates MappedByteBuffers and FileChannels, and the Java API for MappedByteBuffer says "A mapped byte buffer and the file mapping that it represents remain valid until the buffer itself is garbage-collected."
        Should I upload another patch?

        Show
        John Mastarone added a comment - I think I have a better fix for this that does not involve the previous steps taken. Inside Tika's MP4Parser parse method, I call tstream.getFile(), and assign the result to a File that is used to construct a FileInputStream. I read the stream's bytes into an array, and pass this byte array as an argument to a constructor for the IsoBufferWrapperImpl. Then after calling the IsoFile's parse method, tstream.close() does not fail. I'm not sure that using the current IsoBufferWrapperImpl constructor, which takes a File argument, would ever allow for the TikaInputStream to be closed while still inside the MP4Parser's parse method without producing the IOException in Windows, because this constructor creates MappedByteBuffers and FileChannels, and the Java API for MappedByteBuffer says "A mapped byte buffer and the file mapping that it represents remain valid until the buffer itself is garbage-collected." Should I upload another patch?
        Hide
        Nick Burch added a comment -

        If we do need to buffer it all into memory, then there are easier ways to do it than your suggestion - TikaInputStream + IOUtils should do it nicely.

        I've just been reading up on the MappedByteBuffer cleaning problems - <http://bugs.sun.com/view_bug.do?bug_id=4724038> - and it looks like there's no easy/clean solution to this one yet

        Because MP4 files can be very large (many gb large is common), I'm not sure we want to be trying to buffer the whole file into memory just in case.

        Maybe we could try using some reflection to trigger the cleaner on the byte buffer, with careful checks to avoid problems if it isn't sun.nio.ch backed?

        Show
        Nick Burch added a comment - If we do need to buffer it all into memory, then there are easier ways to do it than your suggestion - TikaInputStream + IOUtils should do it nicely. I've just been reading up on the MappedByteBuffer cleaning problems - < http://bugs.sun.com/view_bug.do?bug_id=4724038 > - and it looks like there's no easy/clean solution to this one yet Because MP4 files can be very large (many gb large is common), I'm not sure we want to be trying to buffer the whole file into memory just in case. Maybe we could try using some reflection to trigger the cleaner on the byte buffer, with careful checks to avoid problems if it isn't sun.nio.ch backed?
        Hide
        John Mastarone added a comment -

        I made my own version of the IsoBufferWrapperImpl in which the ByteBuffer array is public, and I tried using the sun.misc.Cleaner code from the bug page in your link on each element in the ByteBuffer array, and following it with a call to tstream.close() (which causes my IOException). I tried putting this block right after the call to isoFile.parse(), and separately tried putting it at the end of the Tika MP4Parser's parse method; neither case prevented the exception from occurring. So, maybe Jukka's suggestion of using deleteOnExit() is the easiest solution for the moment...

        Show
        John Mastarone added a comment - I made my own version of the IsoBufferWrapperImpl in which the ByteBuffer array is public, and I tried using the sun.misc.Cleaner code from the bug page in your link on each element in the ByteBuffer array, and following it with a call to tstream.close() (which causes my IOException). I tried putting this block right after the call to isoFile.parse(), and separately tried putting it at the end of the Tika MP4Parser's parse method; neither case prevented the exception from occurring. So, maybe Jukka's suggestion of using deleteOnExit() is the easiest solution for the moment...
        Hide
        John Mastarone added a comment -

        With a latest snapshot build, I tried dropping the same testMP4.m4a file into the GUI, and received no exception. It may be that the changes that occurred with TIKA-852 on 28/Apr/2012 caused the resolution of this issue, 853.

        Show
        John Mastarone added a comment - With a latest snapshot build, I tried dropping the same testMP4.m4a file into the GUI, and received no exception. It may be that the changes that occurred with TIKA-852 on 28/Apr/2012 caused the resolution of this issue, 853.

          People

          • Assignee:
            Unassigned
            Reporter:
            John Mastarone
          • Votes:
            0 Vote for this issue
            Watchers:
            0 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development