Tika
  1. Tika
  2. TIKA-701

Fix problems with TemporaryFiles

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: 0.9
    • Fix Version/s: 0.10
    • Component/s: None
    • Labels:
      None

      Description

      As discussed on dev@, there are still some design issues with the TemporaryFiles class that we introduced in TIKA-567. They should be solved in time for Tika 1.0.

        Activity

        Hide
        Jukka Zitting added a comment -

        Fixed in a series of recent commits.

        To summarize, I added a more generic TemporaryResources class for tracking all Closeable resources associated with a TikaInputStream. I also fixed a lot of cases where Tika- and other streams were not properly closed when no longer needed.

        Show
        Jukka Zitting added a comment - Fixed in a series of recent commits. To summarize, I added a more generic TemporaryResources class for tracking all Closeable resources associated with a TikaInputStream. I also fixed a lot of cases where Tika- and other streams were not properly closed when no longer needed.
        Hide
        Michael McCandless added a comment -

        These changes look great!

        I like that TIS no longer closes the IS when the full file is read only when a mark wasn't set, nor do we close the old stream in getFile() (only when the TR is later closed).

        I noticed one thing, that I'm not sure is a problem, but looks spooky: TikaInputStream.get can optionally take an incoming TemporaryResources; inside that method we have this logic:

                if (stream instanceof TikaInputStream) {
                    return (TikaInputStream) stream;
                }
        

        My worry is: what if the incoming TemporaryResources differs from the stream's? I added an assert and it sometimes does differ (at least a couple tests tripped it). Will this cause problems...? Ie the returned TIS is not tracked under the provided TR as the caller would expect?

        Show
        Michael McCandless added a comment - These changes look great! I like that TIS no longer closes the IS when the full file is read only when a mark wasn't set, nor do we close the old stream in getFile() (only when the TR is later closed). I noticed one thing, that I'm not sure is a problem, but looks spooky: TikaInputStream.get can optionally take an incoming TemporaryResources; inside that method we have this logic: if (stream instanceof TikaInputStream) { return (TikaInputStream) stream; } My worry is: what if the incoming TemporaryResources differs from the stream's? I added an assert and it sometimes does differ (at least a couple tests tripped it). Will this cause problems...? Ie the returned TIS is not tracked under the provided TR as the caller would expect?
        Hide
        Jukka Zitting added a comment -

        The idea behind that logic is that if the stream we were given is already a TikaInputStream, then someone higher up in the call chain is already responsible
        for properly closing it and thus releasing all associated resources. In that case
        we can simply ignore the given TemporaryResources instance (the dispose() or close()
        call will simply be a no-op with no resources to release), and return the given
        stream as-is. Any resources associated with the stream will get released when the
        original creator of the TikaInputStream instance eventually closes the stream.

        Show
        Jukka Zitting added a comment - The idea behind that logic is that if the stream we were given is already a TikaInputStream, then someone higher up in the call chain is already responsible for properly closing it and thus releasing all associated resources. In that case we can simply ignore the given TemporaryResources instance (the dispose() or close() call will simply be a no-op with no resources to release), and return the given stream as-is. Any resources associated with the stream will get released when the original creator of the TikaInputStream instance eventually closes the stream.
        Hide
        Paul Jakubik added a comment -

        This is a very important fix. Will it be released soon? If it is going to be
        a while before 1.0, is there a chance for a 0.9.1 for getting this fix out
        in a released Tika?

        Show
        Paul Jakubik added a comment - This is a very important fix. Will it be released soon? If it is going to be a while before 1.0, is there a chance for a 0.9.1 for getting this fix out in a released Tika?
        Hide
        Michael McCandless added a comment -

        The idea behind that logic is that if the stream we were given is already a TikaInputStream, then someone higher up in the call chain is already responsible for properly closing it

        OK, so in this case it's fine if the returned TIS wasn't folded under the caller's TR instance, since the higher-up callee will close it. Good!

        Show
        Michael McCandless added a comment - The idea behind that logic is that if the stream we were given is already a TikaInputStream, then someone higher up in the call chain is already responsible for properly closing it OK, so in this case it's fine if the returned TIS wasn't folded under the caller's TR instance, since the higher-up callee will close it. Good!

          People

          • Assignee:
            Jukka Zitting
            Reporter:
            Jukka Zitting
          • Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development