Lucene - Core
  1. Lucene - Core
  2. LUCENE-4209

BytesRefSorter leaves files in /tmp and never cleans them up

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Critical Critical
    • Resolution: Fixed
    • Affects Version/s: 3.6, 4.0-ALPHA
    • Fix Version/s: 3.6.1, 4.0-BETA, 6.0
    • Component/s: core/FSTs
    • Labels:
      None
    • Lucene Fields:
      New

      Description

      When reviewing my Jenkins installation, I found out that /tmp is filled by Jenkins with the following files (in Linux and Windows).

      rw-rr- 1 jenkins nogroup 12433 Jul 5 21:14
      RefSorter-1839005885812820606.sorted
      rw-rr- 1 jenkins nogroup 13574 Jul 5 19:26
      RefSorter-2799526995307200478.sorted
      rw-rr- 1 jenkins nogroup 12600 Jul 5 17:14
      RefSorter-2841491891429925756.sorted
      rw-rr- 1 jenkins nogroup 11697 Jul 5 19:57
      RefSorter-3302954184439492426.sorted
      rw-rr- 1 jenkins nogroup 13496 Jul 5 16:30
      RefSorter-3738422482066276549.sorted
      rw-rr- 1 jenkins nogroup 13781 Jul 5 15:36
      RefSorter-4235756851148318773.sorted
      rw-rr- 1 jenkins nogroup 12719 Jul 5 18:54
      RefSorter-4530019493984469514.sorted
      rw-rr- 1 jenkins nogroup 12696 Jul 5 16:04
      RefSorter-5245195867837976219.sorted
      rw-rr- 1 jenkins nogroup 13879 Jul 5 18:27
      RefSorter-5977302780601133089.sorted
      rw-rr- 1 jenkins nogroup 12712 Jul 5 21:39
      RefSorter-6336186633027300753.sorted
      rw-rr- 1 jenkins nogroup 12820 Jul 5 20:30
      RefSorter-6447286760971372233.sorted
      rw-rr- 1 jenkins nogroup 12105 Jul 5 17:48
      RefSorter-6532780916605441895.sorted
      rw-rr- 1 jenkins nogroup 13505 Jul 5 20:53
      RefSorter-7247901917320979657.sorted
      rw-rr- 1 jenkins nogroup 12688 Jul 5 22:10
      RefSorter-7796370222379929612.sorted
      rw-rr- 1 jenkins nogroup 19 Jul 5 18:54
      sort1277839437346448611partition
      rw-rr- 1 jenkins nogroup 21299752 Jul 5 15:35
      sort1362726822297484023intermediate
      rw-rr- 1 jenkins nogroup 21300496 Jul 5 17:48
      sort1435680293746542872intermediate
      rw-rr- 1 jenkins nogroup 19 Jul 5 16:30
      sort1498884601796138622partition
      rw-rr- 1 jenkins nogroup 21300869 Jul 5 20:30
      sort1634015425760928615intermediate
      rw-rr- 1 jenkins nogroup 19 Jul 5 20:30
      sort1954741677243403383partition
      rw-rr- 1 jenkins nogroup 21300802 Jul 5 20:53
      sort2203784121687916561intermediate
      rw-rr- 1 jenkins nogroup 21300493 Jul 5 22:10
      sort24154414907891444intermediate
      rw-rr- 1 jenkins nogroup 19 Jul 5 22:10
      sort2816986454022083882partition
      rw-rr- 1 jenkins nogroup 21300111 Jul 5 18:27
      sort285022281545547041intermediate
      rw-rr- 1 jenkins nogroup 19 Jul 5 18:28
      sort295507558144077223partition
      rw-rr- 1 jenkins nogroup 21300569 Jul 5 16:30
      sort3013772538520090513intermediate
      rw-rr- 1 jenkins nogroup 21300574 Jul 5 17:14
      sort3297463807520676013intermediate
      rw-rr- 1 jenkins nogroup 19 Jul 5 21:14
      sort3364874175018276528partition
      rw-rr- 1 jenkins nogroup 19 Jul 5 17:14
      sort3846182021346233750partition
      rw-rr- 1 jenkins nogroup 21300204 Jul 5 19:26
      sort4397860673342757974intermediate
      rw-rr- 1 jenkins nogroup 21300050 Jul 5 16:04
      sort4474792189525490476intermediate
      rw-rr- 1 jenkins nogroup 21300825 Jul 5 18:54
      sort4518448528614283778intermediate
      rw-rr- 1 jenkins nogroup 19 Jul 5 21:39
      sort4756172476965226743partition
      rw-rr- 1 jenkins nogroup 19 Jul 5 20:53
      sort5416699953867843402partition
      rw-rr- 1 jenkins nogroup 19 Jul 5 19:26
      sort5558474409634346477partition
      rw-rr- 1 jenkins nogroup 19 Jul 5 17:48
      sort6281513108922200314partition
      rw-rr- 1 jenkins nogroup 21300155 Jul 5 21:39
      sort6639309492804635005intermediate
      rw-rr- 1 jenkins nogroup 19 Jul 5 19:57
      sort6777765458777941142partition
      rw-rr- 1 jenkins nogroup 21301369 Jul 5 19:57
      sort6973021800616034113intermediate
      rw-rr- 1 jenkins nogroup 21300341 Jul 5 21:14
      sort7260811068342958052intermediate
      rw-rr- 1 jenkins nogroup 19 Jul 5 16:04
      sort852078170643422390partition
      rw-rr- 1 jenkins nogroup 19 Jul 5 15:35
      sort8857722113319559286partition

      The pattern "RefSorter-" I found in Lucene's source code, so it must come
      from tests. Why are they not cleaned up and why do we need those files? Would a RamDirectory not be enough for this?

      This is serious, as the files are never cleaned up, they stay alive when the test passes, so its not caused by the always failing Solr Suggester tests.

      There are also other filenames with .sorted and similar at end.

      The slave was taken automatically offline after its RAM-based /tmp (2 GB) was filling in <24h). On the Windows Box c:\Users\JenkinsSlave\AppData\Temp contained already 60,000 files like this (still deleting them), taking 12 GB of disk space. I will review Apache Jenkins, too -> also cleaned up lots of files.

      1. LUCENE-4209_more.patch
        3 kB
        Robert Muir
      2. LUCENE-4209.patch
        3 kB
        Robert Muir
      3. LUCENE-4209.patch
        3 kB
        Robert Muir
      4. LUCENE-4209-enforce-cleanup.patch
        4 kB
        Uwe Schindler

        Activity

        Hide
        Uwe Schindler added a comment -

        In my opinion, the placing of files should be configureable for user, it should not create File.createTempFile() [Robert: Put it on the forbidden list, please...]

        Show
        Uwe Schindler added a comment - In my opinion, the placing of files should be configureable for user, it should not create File.createTempFile() [Robert: Put it on the forbidden list, please...]
        Hide
        Robert Muir added a comment -

        can you try this one on windows?

        Show
        Robert Muir added a comment - can you try this one on windows?
        Hide
        Uwe Schindler added a comment -

        I have also seen (on Windows), shit like: WFSTTermFreqIteratorWrapper8451761996211413579.sorted

        I will try now after cleaning up.

        Show
        Uwe Schindler added a comment - I have also seen (on Windows), shit like: WFSTTermFreqIteratorWrapper8451761996211413579.sorted I will try now after cleaning up.
        Hide
        Uwe Schindler added a comment -

        On Windows, those files stayed there after running tests in suggest:

        10.07.2012 20:40 0 SortedTermFreqIteratorWrapper1308605874198741902.sorted
        10.07.2012 20:40 351.131 SortedTermFreqIteratorWrapper2027363697367869268.sorted
        10.07.2012 20:40 449.985 SortedTermFreqIteratorWrapper5542079452540558393.sorted
        10.07.2012 20:39 241 SortedTermFreqIteratorWrapper690999681538401442.sorted
        10.07.2012 20:40 13.505 WFSTTermFreqIteratorWrapper6984334111477611000.sorted
        10.07.2012 20:40 47 WFSTTermFreqIteratorWrapper7332590534826479332.sorted

        Show
        Uwe Schindler added a comment - On Windows, those files stayed there after running tests in suggest: 10.07.2012 20:40 0 SortedTermFreqIteratorWrapper1308605874198741902.sorted 10.07.2012 20:40 351.131 SortedTermFreqIteratorWrapper2027363697367869268.sorted 10.07.2012 20:40 449.985 SortedTermFreqIteratorWrapper5542079452540558393.sorted 10.07.2012 20:39 241 SortedTermFreqIteratorWrapper690999681538401442.sorted 10.07.2012 20:40 13.505 WFSTTermFreqIteratorWrapper6984334111477611000.sorted 10.07.2012 20:40 47 WFSTTermFreqIteratorWrapper7332590534826479332.sorted
        Hide
        Robert Muir added a comment -

        try this one: I think i know the problem on Windows.

        See my changes to SortedTermFreqIteratorWrapper.

        Show
        Robert Muir added a comment - try this one: I think i know the problem on Windows. See my changes to SortedTermFreqIteratorWrapper.
        Hide
        Uwe Schindler added a comment -

        This one fixes on windows, too.

        We should commit this now to make my machines happy and open another issue to make this horrible random file placement configureable like in your original Sorter (taking Directory instead of File.createTempFile()). We must put this method on the forbidden list!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! damn!!!!!!

        Show
        Uwe Schindler added a comment - This one fixes on windows, too. We should commit this now to make my machines happy and open another issue to make this horrible random file placement configureable like in your original Sorter (taking Directory instead of File.createTempFile()). We must put this method on the forbidden list!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! damn!!!!!!
        Hide
        Robert Muir added a comment -

        I agree the Directory abstraction would be nice here.

        Then we can verify everything (including windows correctness and no leaks) with MockDirectoryWrapper.

        Show
        Robert Muir added a comment - I agree the Directory abstraction would be nice here. Then we can verify everything (including windows correctness and no leaks) with MockDirectoryWrapper.
        Hide
        Dawid Weiss added a comment -

        Why were those temporary files not cleaned up? A bug?

        Show
        Dawid Weiss added a comment - Why were those temporary files not cleaned up? A bug?
        Hide
        Robert Muir added a comment -

        there were three cases:
        1. not calling Sorter.close() in FSTCompletionLookup
        2. not closing things in tests.
        3. trying to delete things before closing an open reader on them in SoftedTermFreqIteratorWrapper: windows problem only, it will not allow that.

        Show
        Robert Muir added a comment - there were three cases: 1. not calling Sorter.close() in FSTCompletionLookup 2. not closing things in tests. 3. trying to delete things before closing an open reader on them in SoftedTermFreqIteratorWrapper: windows problem only, it will not allow that.
        Hide
        Dawid Weiss added a comment -

        Oh crap.

        Show
        Dawid Weiss added a comment - Oh crap.
        Hide
        Robert Muir added a comment -

        I committed this on trunk/branch_4x/branch_3_6

        Show
        Robert Muir added a comment - I committed this on trunk/branch_4x/branch_3_6
        Hide
        Uwe Schindler added a comment - - edited

        The Solr FST test also creates (in Linux, too) 2 of those files and never deletes them:

        rw-rr- 1 jenkins nogroup 19 Jul 10 21:07 sort3792768274336297309partition
        rw-rr- 1 jenkins nogroup 21300609 Jul 10 21:07 sort8319180334296886006intermediate

        Show
        Uwe Schindler added a comment - - edited The Solr FST test also creates (in Linux, too) 2 of those files and never deletes them: rw-r r - 1 jenkins nogroup 19 Jul 10 21:07 sort3792768274336297309partition rw-r r - 1 jenkins nogroup 21300609 Jul 10 21:07 sort8319180334296886006intermediate
        Hide
        Uwe Schindler added a comment -

        Problem found with Robert:
        It's not Solr, its again Sort.java.

        This time this happens:
        On the Jenkins machine /tmp is a separate filesystem (tmpfs), so the code uses the fallback, if file.renbameTo() does not work and copies the file. But forgets to delete the orginal.

        Robert has patch.

        Show
        Uwe Schindler added a comment - Problem found with Robert: It's not Solr, its again Sort.java. This time this happens: On the Jenkins machine /tmp is a separate filesystem (tmpfs), so the code uses the fallback, if file.renbameTo() does not work and copies the file. But forgets to delete the orginal. Robert has patch.
        Hide
        Robert Muir added a comment -

        here's the patch: what a horrendous thing to track down.

        It only happened on Uwe's computer because he has /tmp on a separate volume. So the rename fails, and it does this copy() thing, but doesn't delete the old file.

        Show
        Robert Muir added a comment - here's the patch: what a horrendous thing to track down. It only happened on Uwe's computer because he has /tmp on a separate volume. So the rename fails, and it does this copy() thing, but doesn't delete the old file.
        Hide
        Uwe Schindler added a comment -

        Patch that enforces cleanup of all temporarily generated files on success or failure, also partially written output files are deleted on error.

        We should do the same for the other places like BytesRefSorter.

        Show
        Uwe Schindler added a comment - Patch that enforces cleanup of all temporarily generated files on success or failure, also partially written output files are deleted on error. We should do the same for the other places like BytesRefSorter.
        Hide
        Robert Muir added a comment -

        +1

        Show
        Robert Muir added a comment - +1
        Hide
        Uwe Schindler added a comment -

        Committed trunk revision 1359953, 4.x revision 1359954, 3.6 revision 1359956.

        Show
        Uwe Schindler added a comment - Committed trunk revision 1359953, 4.x revision 1359954, 3.6 revision 1359956.
        Hide
        Hoss Man added a comment -

        bulk cleanup of 4.0-ALPHA / 4.0 Jira versioning. all bulk edited issues have hoss20120711-bulk-40-change in a comment

        Show
        Hoss Man added a comment - bulk cleanup of 4.0-ALPHA / 4.0 Jira versioning. all bulk edited issues have hoss20120711-bulk-40-change in a comment
        Hide
        Robert Muir added a comment -

        issue was left hanging open, but its resolved.

        Show
        Robert Muir added a comment - issue was left hanging open, but its resolved.

          People

          • Assignee:
            Unassigned
            Reporter:
            Uwe Schindler
          • Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development