Lucene - Core
  1. Lucene - Core
  2. LUCENE-6813

OfflineSorter.sort shouldn't remove the output Path up front

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 5.4, 6.0
    • Component/s: None
    • Labels:
      None
    • Lucene Fields:
      New

      Description

      The new BKD tree classes, and NumericRangeTree (just a 1D BKD tree),
      make heavy use of OfflineSorter to build their data structures at
      indexing time when the number of indexed documents is biggish.

      But when I was first building them (LUCENE-6477), I hit a thread
      safety issue in OfflineSorter, and at that time I just worked around
      it by creating my own private temp directory each time I need to write
      a BKD tree.

      This workaround is sort of messy, and it causes problems with "pending
      delete" files on Windows when we try to remove that temp directory,
      causing test failures like http://jenkins.thetaphi.de/job/Lucene-Solr-5.x-Windows/5149/

      I think instead we should fix the root cause ... i.e. make
      OfflineSorter thread safe. It looks like it's simple...

      Separately I'd like to somehow fix these BKD tests to catch any leaked
      file handles ... I'm not sure they are today.

      1. LUCENE-6813.patch
        29 kB
        Michael McCandless
      2. LUCENE-6813.patch
        22 kB
        Michael McCandless
      3. LUCENE-6813.patch
        4 kB
        Michael McCandless

        Activity

        Hide
        Michael McCandless added a comment -

        Patch, but I still need to add a thread test to see if it provokes the original issue I hit.

        I think the only reason why OfflineSorter.sort wasn't thread safe was because it removed the output file up front, instead of replacing it later with the atomic move ... I just removed that Files.deleteIfExists and then added StandardCopyOption.REPLACE_EXISTING later when we do the Files.move.

        Show
        Michael McCandless added a comment - Patch, but I still need to add a thread test to see if it provokes the original issue I hit. I think the only reason why OfflineSorter.sort wasn't thread safe was because it removed the output file up front, instead of replacing it later with the atomic move ... I just removed that Files.deleteIfExists and then added StandardCopyOption.REPLACE_EXISTING later when we do the Files.move.
        Hide
        Dawid Weiss added a comment -

        I don't fully understand the problem but to me OfflineSorter is thread safe – it takes input and output paths, then potentially creates some intermediate files which should never cause any threading problems because they're created atomically by the file system. OfflineSorter also makes best effort to delete these files. If your output already exists, it should be overwritten... where is the thread safety problem?

        As for Windows and the pending delete queue, can we pinpoint when this is happening (is it a leaked file handle, lock)? Perhaps there is a better fix to just cater for the delay of file/ folder deletion in Windows (assuming this is a documented feature)? If Files.delete returns and the file is not deleted, this seems like a bug in the JDK to me?

        http://docs.oracle.com/javase/7/docs/api/java/nio/file/Files.html#delete(java.nio.file.Path)

        Show
        Dawid Weiss added a comment - I don't fully understand the problem but to me OfflineSorter is thread safe – it takes input and output paths, then potentially creates some intermediate files which should never cause any threading problems because they're created atomically by the file system. OfflineSorter also makes best effort to delete these files. If your output already exists, it should be overwritten... where is the thread safety problem? As for Windows and the pending delete queue, can we pinpoint when this is happening (is it a leaked file handle, lock)? Perhaps there is a better fix to just cater for the delay of file/ folder deletion in Windows (assuming this is a documented feature)? If Files.delete returns and the file is not deleted, this seems like a bug in the JDK to me? http://docs.oracle.com/javase/7/docs/api/java/nio/file/Files.html#delete(java.nio.file.Path )
        Hide
        Dawid Weiss added a comment -

        Also, this looks suspicious to me in OfflineSorter:

                // If simple rename doesn't work this means the output is
                // on a different volume or something. Copy the input then.
                try {
                  Files.move(single, output, StandardCopyOption.ATOMIC_MOVE);
                } catch (IOException | UnsupportedOperationException e) {
                  Files.copy(single, output);
                }
        

        because Files.move should move files across volumes (so if it throws an exception then calling copy duplicates effort):
        http://docs.oracle.com/javase/7/docs/api/java/nio/file/Files.html#move(java.nio.file.Path,%20java.nio.file.Path,%20java.nio.file.CopyOption...)

        This may be a left-over piece of code from when File.renameTo was used (which indeed doesn't work across volumes).

        Show
        Dawid Weiss added a comment - Also, this looks suspicious to me in OfflineSorter: // If simple rename doesn't work this means the output is // on a different volume or something. Copy the input then. try { Files.move(single, output, StandardCopyOption.ATOMIC_MOVE); } catch (IOException | UnsupportedOperationException e) { Files.copy(single, output); } because Files.move should move files across volumes (so if it throws an exception then calling copy duplicates effort): http://docs.oracle.com/javase/7/docs/api/java/nio/file/Files.html#move(java.nio.file.Path,%20java.nio.file.Path,%20java.nio.file.CopyOption ...) This may be a left-over piece of code from when File.renameTo was used (which indeed doesn't work across volumes).
        Hide
        Michael McCandless added a comment -

        I don't fully understand the problem but to me OfflineSorter is thread safe

        Sorry I'm still trying to isolate exactly what the issue is ... I'll fixup the issue title once I have more of a clue.

        I think the problem is (maybe) that OfflineSorter.sort currently removes its output path well before writing to it, and so if the caller is relying on Files.createTempFile to "pick" a unique filename across threads, which BKD is doing, then this can illegally re-use the same output Path across threads.

        But I'm not certain this is the problem, I need to get the thread test online to see if I can repro/understand outside of BKD's usage.

        Also, this looks suspicious to me in OfflineSorter:

        If I remove that try/catch then Files.move is angry because it cannot be ATOMIC_MOVE across volumes ... can I just remove the ATOMIC_MOVE option (and the try/catch)? Why must this be atomic?

        Show
        Michael McCandless added a comment - I don't fully understand the problem but to me OfflineSorter is thread safe Sorry I'm still trying to isolate exactly what the issue is ... I'll fixup the issue title once I have more of a clue. I think the problem is (maybe) that OfflineSorter.sort currently removes its output path well before writing to it, and so if the caller is relying on Files.createTempFile to "pick" a unique filename across threads, which BKD is doing, then this can illegally re-use the same output Path across threads. But I'm not certain this is the problem, I need to get the thread test online to see if I can repro/understand outside of BKD's usage. Also, this looks suspicious to me in OfflineSorter: If I remove that try/catch then Files.move is angry because it cannot be ATOMIC_MOVE across volumes ... can I just remove the ATOMIC_MOVE option (and the try/catch )? Why must this be atomic?
        Hide
        Dawid Weiss added a comment -

        I think the problem is (maybe) that OfflineSorter.sort currently removes its output path well before writing to it, and so if the caller is relying on Files.createTempFile to "pick" a unique filename across threads, which BKD is doing, then this can illegally re-use the same output Path across threads.

        Ok, I think I understand you now. In that case indeed OfflineSorter.sort shouldn't be removing the output path and calling Files.move with REPLACE_EXISTING. I don't think an atomic move is required (since we don't care about other processes observing partially moved/copied file).

        Show
        Dawid Weiss added a comment - I think the problem is (maybe) that OfflineSorter.sort currently removes its output path well before writing to it, and so if the caller is relying on Files.createTempFile to "pick" a unique filename across threads, which BKD is doing, then this can illegally re-use the same output Path across threads. Ok, I think I understand you now. In that case indeed OfflineSorter.sort shouldn't be removing the output path and calling Files.move with REPLACE_EXISTING. I don't think an atomic move is required (since we don't care about other processes observing partially moved/copied file).
        Hide
        Michael McCandless added a comment -

        Patch, I think it's ready.

        I added a thread safety test, but it passes even before my fix. I'll
        update the issue title ... sorry for the false blame! Now I don't
        know why BKD was angry before ... but it seems happy now.

        I also fixed test infra to set the default temp dir used by
        OfflineSorter, so we get MockFS checking, and (YAY!) it caught the
        offending place in RangeTreeWriter where it was leaking a file handle!
        This should fix the windows failures (finally!).

        Show
        Michael McCandless added a comment - Patch, I think it's ready. I added a thread safety test, but it passes even before my fix. I'll update the issue title ... sorry for the false blame! Now I don't know why BKD was angry before ... but it seems happy now. I also fixed test infra to set the default temp dir used by OfflineSorter, so we get MockFS checking, and (YAY!) it caught the offending place in RangeTreeWriter where it was leaking a file handle! This should fix the windows failures (finally!).
        Hide
        Dawid Weiss added a comment - - edited

        Hmm... I don't like the move of defaultTempDir() to a static initializer – it means if anything fails (property read, whatever) then you'd get an early exception upon class linking which can often be very confusing. I'd rather have a lazy return as it was before.

        +    // So all code using OfflineSorter (suggesters, BKD tree, NumericRangeTree) see MockFS goodness, e.g. catching leaked file handles:
        +    OfflineSorter.setDefaultTempDir(javaTempDir);
        

        Wouldn't it be possible to set the java.io.tmpdir property to a path that resolves to mockfs instead? Then any paths resolved from java.io.tmpdir would be "wrapped" by MockFS, no matter where they originate from and without the (dodgy) static test-only variable substitution methods... Don't know how hard it'd be though.

        Show
        Dawid Weiss added a comment - - edited Hmm... I don't like the move of defaultTempDir() to a static initializer – it means if anything fails (property read, whatever) then you'd get an early exception upon class linking which can often be very confusing. I'd rather have a lazy return as it was before. + // So all code using OfflineSorter (suggesters, BKD tree, NumericRangeTree) see MockFS goodness, e.g. catching leaked file handles: + OfflineSorter.setDefaultTempDir(javaTempDir); Wouldn't it be possible to set the java.io.tmpdir property to a path that resolves to mockfs instead? Then any paths resolved from java.io.tmpdir would be "wrapped" by MockFS, no matter where they originate from and without the (dodgy) static test-only variable substitution methods... Don't know how hard it'd be though.
        Hide
        Michael McCandless added a comment -

        Hmm... I don't like the move of defaultTempDir() to a static initializer

        OK ... if there is a cleaner way I would love to do that instead.

        Wouldn't it be possible to set the java.io.tmpdir property to a path that resolves to mockfs instead?

        I don't know enough about mockfs to know if this is possible I thought we must use the Path we had set up (backed by the mock FileSystem we created) ... but maybe there is a way to "install" the mock filesystem differently?

        I do think it's really useful to have OfflineSorter use our mock filesystems: it caught the bug in RangeTreeWriter...

        Show
        Michael McCandless added a comment - Hmm... I don't like the move of defaultTempDir() to a static initializer OK ... if there is a cleaner way I would love to do that instead. Wouldn't it be possible to set the java.io.tmpdir property to a path that resolves to mockfs instead? I don't know enough about mockfs to know if this is possible I thought we must use the Path we had set up (backed by the mock FileSystem we created) ... but maybe there is a way to "install" the mock filesystem differently? I do think it's really useful to have OfflineSorter use our mock filesystems: it caught the bug in RangeTreeWriter ...
        Hide
        Dawid Weiss added a comment -

        I'd have to look into it. Filesystem providers have paths conversion utilities so a "mockfs" path in the property should be doable... I've been burnt by static initializers wish side effects (exceptions) so many time that it always displays a red flag for me.

        Show
        Dawid Weiss added a comment - I'd have to look into it. Filesystem providers have paths conversion utilities so a "mockfs" path in the property should be doable... I've been burnt by static initializers wish side effects (exceptions) so many time that it always displays a red flag for me.
        Hide
        Dawid Weiss added a comment -

        It'd have to be a global override of java.nio.file.spi.DefaultFileSystemProvider which would wrap all of file:/// URIs. We could then install arbitrary filesystem operation interceptors or tracking. The problem is, as far as I recall, that this SPI is not frequently used and Robert Muir mentioned to me once that it's not actually implementable...

        It'd be an interesting experiment to try to do leak checking via the default system provider.

        Show
        Dawid Weiss added a comment - It'd have to be a global override of java.nio.file.spi.DefaultFileSystemProvider which would wrap all of file:/// URIs. We could then install arbitrary filesystem operation interceptors or tracking. The problem is, as far as I recall, that this SPI is not frequently used and Robert Muir mentioned to me once that it's not actually implementable... It'd be an interesting experiment to try to do leak checking via the default system provider.
        Hide
        Robert Muir added a comment -

        Can we please not do this? it was intentional that we dont change the default provider. At the very least we have test classes that do not want the wrapping: and we have annotations to allow them to disable it.

        All we need is a setter here.

        Show
        Robert Muir added a comment - Can we please not do this? it was intentional that we dont change the default provider. At the very least we have test classes that do not want the wrapping: and we have annotations to allow them to disable it. All we need is a setter here.
        Hide
        Robert Muir added a comment -

        Mike, your setter can also be package private. Maybe that will make it more digestible.

        Show
        Robert Muir added a comment - Mike, your setter can also be package private. Maybe that will make it more digestible.
        Hide
        Dawid Weiss added a comment -

        Yeah... I noticed that while looking at the code. These are not really filesystem providers, they're default filesystem decorators...

        Show
        Dawid Weiss added a comment - Yeah... I noticed that while looking at the code. These are not really filesystem providers, they're default filesystem decorators...
        Hide
        Robert Muir added a comment -

        We also don't use them all the time: we don't want the possibility of hiding bugs with any mocking layers.

        And this stuff is complex... what we have "works for us" but I would not say it is ready to be set as default for the jvm, that is my opinion.

        Show
        Robert Muir added a comment - We also don't use them all the time: we don't want the possibility of hiding bugs with any mocking layers. And this stuff is complex... what we have "works for us" but I would not say it is ready to be set as default for the jvm, that is my opinion.
        Hide
        Michael McCandless added a comment -

        Mike, your setter can also be package private

        Oh good point, I'll do that.

        Show
        Michael McCandless added a comment - Mike, your setter can also be package private Oh good point, I'll do that.
        Hide
        Michael McCandless added a comment -

        OK how about this new patch?

        I made the new setter package private, and I removed the static class init,
        and instead made it lazy init the first time someone asks for it.

        And I renamed defaultTempDir() --> getDefaultTempDir().

        Show
        Michael McCandless added a comment - OK how about this new patch? I made the new setter package private, and I removed the static class init, and instead made it lazy init the first time someone asks for it. And I renamed defaultTempDir() --> getDefaultTempDir() .
        Hide
        Dawid Weiss added a comment -

        We also don't use them all the time: we don't want the possibility of hiding bugs with any mocking layers.

        I know, but since these are effectively decorators I think it could be done in a way in which you'd turn them on and off dynamically. The only difference would be that they'd be effectively global. I don't want to impose it on Lucene, I'm just saying it'd be a fun exercise in general to write such a layer.

        The patch looks good to me.

        Show
        Dawid Weiss added a comment - We also don't use them all the time: we don't want the possibility of hiding bugs with any mocking layers. I know, but since these are effectively decorators I think it could be done in a way in which you'd turn them on and off dynamically. The only difference would be that they'd be effectively global. I don't want to impose it on Lucene, I'm just saying it'd be a fun exercise in general to write such a layer. The patch looks good to me.
        Hide
        ASF subversion and git services added a comment -

        Commit 1705155 from Michael McCandless in branch 'dev/trunk'
        [ https://svn.apache.org/r1705155 ]

        LUCENE-6813: OfflineSorter no longer removes its output file up front; fix file handle leak in RangeTreeWriter

        Show
        ASF subversion and git services added a comment - Commit 1705155 from Michael McCandless in branch 'dev/trunk' [ https://svn.apache.org/r1705155 ] LUCENE-6813 : OfflineSorter no longer removes its output file up front; fix file handle leak in RangeTreeWriter
        Hide
        ASF subversion and git services added a comment -

        Commit 1705162 from Michael McCandless in branch 'dev/trunk'
        [ https://svn.apache.org/r1705162 ]

        LUCENE-6813: remove newly deceased code

        Show
        ASF subversion and git services added a comment - Commit 1705162 from Michael McCandless in branch 'dev/trunk' [ https://svn.apache.org/r1705162 ] LUCENE-6813 : remove newly deceased code
        Hide
        ASF subversion and git services added a comment -

        Commit 1705168 from Michael McCandless in branch 'dev/branches/branch_5x'
        [ https://svn.apache.org/r1705168 ]

        LUCENE-6813: OfflineSorter no longer removes its output file up front; fix file handle leak in RangeTreeWriter

        Show
        ASF subversion and git services added a comment - Commit 1705168 from Michael McCandless in branch 'dev/branches/branch_5x' [ https://svn.apache.org/r1705168 ] LUCENE-6813 : OfflineSorter no longer removes its output file up front; fix file handle leak in RangeTreeWriter

          People

          • Assignee:
            Michael McCandless
            Reporter:
            Michael McCandless
          • Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development