Lucene - Core
  1. Lucene - Core
  2. LUCENE-6150

Remove staleFiles set and onIndexOutputClosed from FSDirectory

    Details

    • Type: Task Task
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 5.0, 6.0
    • Component/s: core/store
    • Labels:
      None
    • Lucene Fields:
      New

      Description

      Hi,
      the "hack" to keep track of files written to in FSDirectory, to filter them when calling sync is heavily broken. Michael McCandless already opened an issue, which was abandoned then.

      In fact handling this in FSDirectory is a hack and broken! If IndexWriter is itsself not able to correctly handle tracking the files, it is also his repsonsibilty to do this. We already have a class that can do this: TrackingDirectoryWrapper. IndexWriter should use an instance of this class to track those stale files (until the problem is solved).

      I would like to keep FSDirectory clean from this, especially, as this is broken anyways: If somebody has another directory impl like HDFS or Infinispan, the problem still persists. The base directory should throw an IOException if trying to sync a file that does not exist!

      1. LUCENE-6150.patch
        3 kB
        Uwe Schindler

        Issue Links

          Activity

          Hide
          Uwe Schindler added a comment -

          It could be that the problem is already solved and we can simply remove the stale files map: LUCENE-5570

          Show
          Uwe Schindler added a comment - It could be that the problem is already solved and we can simply remove the stale files map: LUCENE-5570
          Hide
          Uwe Schindler added a comment -

          This removes the staleFiles set from FSDirectory. We already have enough tests that ensures that we have no longer any files that were deleted before. If we still miss some files, FSDirectory should not silently ignore those deleted files and throw IOException.

          I ran all tests with FSDirectory enabled and got no failures because of synced files that do not exist.

          I would like to commit this and let it bake to discover any bugs caused by not correctly tracking written files. But as MockDir already tracks this I am confident.

          Maybe Michael McCandless can give some more insight!

          Show
          Uwe Schindler added a comment - This removes the staleFiles set from FSDirectory. We already have enough tests that ensures that we have no longer any files that were deleted before. If we still miss some files, FSDirectory should not silently ignore those deleted files and throw IOException. I ran all tests with FSDirectory enabled and got no failures because of synced files that do not exist. I would like to commit this and let it bake to discover any bugs caused by not correctly tracking written files. But as MockDir already tracks this I am confident. Maybe Michael McCandless can give some more insight!
          Hide
          Robert Muir added a comment -

          I think mike's concern was to avoid huge amount of fsync calls? Maybe especially on windows since everyone is clueless about what windows does ----> afraid things might be slower.

          I mean to me this is the OS's job to be a no-op for the ones we have "already synced": it knows there are no modifications to those. If windows gets slower, thats windows problem.

          +1 to the patch.

          Show
          Robert Muir added a comment - I think mike's concern was to avoid huge amount of fsync calls? Maybe especially on windows since everyone is clueless about what windows does ----> afraid things might be slower. I mean to me this is the OS's job to be a no-op for the ones we have "already synced": it knows there are no modifications to those. If windows gets slower, thats windows problem. +1 to the patch.
          Hide
          Uwe Schindler added a comment -

          I think, if Mike's concern is still there, he should implement this in IndexWriter directly by using a TrackingDirectoryWrapper. It is for sure not the responsibility of FSDirectory to keep track.

          Show
          Uwe Schindler added a comment - I think, if Mike's concern is still there, he should implement this in IndexWriter directly by using a TrackingDirectoryWrapper. It is for sure not the responsibility of FSDirectory to keep track.
          Hide
          Uwe Schindler added a comment -
          Show
          Uwe Schindler added a comment - This is also quite intersting to get insights on Windows: http://winntfs.com/2012/11/29/windows-write-caching-part-2-an-overview-for-application-developers/
          Hide
          Michael McCandless added a comment -

          +1 to simply remove this tracking (Uwe Schindler's patch) entirely! fsync really should be fast if the file was already forced to disk.

          Show
          Michael McCandless added a comment - +1 to simply remove this tracking ( Uwe Schindler 's patch) entirely! fsync really should be fast if the file was already forced to disk.
          Hide
          Uwe Schindler added a comment -

          Thanks Mike for confirmation! I will commit this and we will se the impact in your nightly benchmark, so I think we are fine.

          If it is really a problem, I would like to "simplify" the tracking (remove the onIndexOutputClosed callback - that was the most annoying thing, because it makes the API horrible, because the tracking is not hidden). In fact, the tracking on close is not needed here. Its enough to simply add the filename to the set while opeining the output, then it can be completely private to the impl in FSDirectory.

          Show
          Uwe Schindler added a comment - Thanks Mike for confirmation! I will commit this and we will se the impact in your nightly benchmark, so I think we are fine. If it is really a problem, I would like to "simplify" the tracking (remove the onIndexOutputClosed callback - that was the most annoying thing, because it makes the API horrible, because the tracking is not hidden). In fact, the tracking on close is not needed here. Its enough to simply add the filename to the set while opeining the output, then it can be completely private to the impl in FSDirectory.
          Hide
          ASF subversion and git services added a comment -

          Commit 1648812 from Uwe Schindler in branch 'dev/trunk'
          [ https://svn.apache.org/r1648812 ]

          LUCENE-6150: Remove staleFiles set and onIndexOutputClosed() from FSDirectory

          Show
          ASF subversion and git services added a comment - Commit 1648812 from Uwe Schindler in branch 'dev/trunk' [ https://svn.apache.org/r1648812 ] LUCENE-6150 : Remove staleFiles set and onIndexOutputClosed() from FSDirectory
          Hide
          ASF subversion and git services added a comment -

          Commit 1648813 from Uwe Schindler in branch 'dev/branches/branch_5x'
          [ https://svn.apache.org/r1648813 ]

          Merged revision(s) 1648812 from lucene/dev/trunk:
          LUCENE-6150: Remove staleFiles set and onIndexOutputClosed() from FSDirectory

          Show
          ASF subversion and git services added a comment - Commit 1648813 from Uwe Schindler in branch 'dev/branches/branch_5x' [ https://svn.apache.org/r1648813 ] Merged revision(s) 1648812 from lucene/dev/trunk: LUCENE-6150 : Remove staleFiles set and onIndexOutputClosed() from FSDirectory
          Hide
          Anshum Gupta added a comment -

          Bulk close after 5.0 release.

          Show
          Anshum Gupta added a comment - Bulk close after 5.0 release.

            People

            • Assignee:
              Uwe Schindler
              Reporter:
              Uwe Schindler
            • Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development