Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 4.7.2, 4.8, Trunk
    • Component/s: core/store
    • Labels:
      None
    • Lucene Fields:
      New

      Description

      This method has a lot of problems:
      1. it tracks 'stale files' as it writes (this seems pointless), and only actually fsyncs the intersection of that 'stale files' and the filenames passed as argument to sync(). So any bogus names passed to sync() are just silently ignored
      2. if "something bad happens" (e.g. two indexwriters/dirs on the same path, or some other shenanigans), and the file is actually in stale files, but was say actually deleted on the filesystem, the underlying fsync() call will create a new 0-byte file and fsync that.

      In my opinion we should do none of this. we should throw exceptions when this stuff is wrong.

      1. LUCENE-5570.patch
        4 kB
        Robert Muir
      2. LUCENE-5570.patch
        3 kB
        Robert Muir
      3. LUCENE-5570_zerobyte.patch
        2 kB
        Robert Muir
      4. LUCENE-5570_java6.patch
        4 kB
        Robert Muir

        Issue Links

          Activity

          Hide
          Robert Muir added a comment -

          here's a patch and a prototype solution for discussion.

          Show
          Robert Muir added a comment - here's a patch and a prototype solution for discussion.
          Hide
          Robert Muir added a comment -

          As far as this staleFiles, we can keep it if we need. but it should not be this silly retainAll() call.
          we should record syncedFiles or something like that and be more careful.

          Show
          Robert Muir added a comment - As far as this staleFiles, we can keep it if we need. but it should not be this silly retainAll() call. we should record syncedFiles or something like that and be more careful.
          Hide
          Michael McCandless added a comment -

          if "something bad happens" (e.g. two indexwriters/dirs on the same path, or some other shenanigans), and the file is actually in stale files, but was say actually deleted on the filesystem, the underlying fsync() call will create a new 0-byte file and fsync that.

          This is truly awful: sync of a non-existent file should not bring a 0-byte file into existence!

          it tracks 'stale files' as it writes (this seems pointless), and only actually fsyncs the intersection of that 'stale files' and the filenames passed as argument to sync(). So any bogus names passed to sync() are just silently ignored

          This is because IW passes all files referenced by all segments when it commits(), i.e. we push the responsibility of remembering which files are written but not sync'd down to Directory. This used to be IW's responsibility, but we changed that in LUCENE-2328 I think.

          Show
          Michael McCandless added a comment - if "something bad happens" (e.g. two indexwriters/dirs on the same path, or some other shenanigans), and the file is actually in stale files, but was say actually deleted on the filesystem, the underlying fsync() call will create a new 0-byte file and fsync that. This is truly awful: sync of a non-existent file should not bring a 0-byte file into existence! it tracks 'stale files' as it writes (this seems pointless), and only actually fsyncs the intersection of that 'stale files' and the filenames passed as argument to sync(). So any bogus names passed to sync() are just silently ignored This is because IW passes all files referenced by all segments when it commits(), i.e. we push the responsibility of remembering which files are written but not sync'd down to Directory. This used to be IW's responsibility, but we changed that in LUCENE-2328 I think.
          Hide
          Robert Muir added a comment -

          This is because IW passes all files referenced by all segments when it commits(), i.e. we push the responsibility of remembering which files are written but not sync'd down to Directory. This used to be IW's responsibility, but we changed that in LUCENE-2328 I think.

          I cant see this method being correct unless we also track syncedFiles, which implies an unbounded list!

          I think we should move this stuff back into IW...

          Show
          Robert Muir added a comment - This is because IW passes all files referenced by all segments when it commits(), i.e. we push the responsibility of remembering which files are written but not sync'd down to Directory. This used to be IW's responsibility, but we changed that in LUCENE-2328 I think. I cant see this method being correct unless we also track syncedFiles, which implies an unbounded list! I think we should move this stuff back into IW...
          Hide
          Robert Muir added a comment -

          By the way i think we can go back to RAF actually, its fine to just open the file only for read to sync it. This is described in the javadocs for force().

          I just want an exception if the file is bogus

          Show
          Robert Muir added a comment - By the way i think we can go back to RAF actually, its fine to just open the file only for read to sync it. This is described in the javadocs for force(). I just want an exception if the file is bogus
          Hide
          Simon Willnauer added a comment -

          after looking on the bug that lead to this issue I'd have appreciated to get a FNF exception rather than 0-byte files to begin with. The trappieness of sync and the stale files map is one thing which we can fix in a different issue IMO. But the 0-byte files we should get fixed right away. I also thinkg that it might be ok to throw an exception if you wanna sync a file that was not written through this directory?

          Show
          Simon Willnauer added a comment - after looking on the bug that lead to this issue I'd have appreciated to get a FNF exception rather than 0-byte files to begin with. The trappieness of sync and the stale files map is one thing which we can fix in a different issue IMO. But the 0-byte files we should get fixed right away. I also thinkg that it might be ok to throw an exception if you wanna sync a file that was not written through this directory?
          Hide
          Robert Muir added a comment -

          Thats a good point Simon, i think its going to be a pain to deal with this stupid stale files map (to really fix the stupid leniency)

          But as a start, we should fix fsync to not create new zero byte files under any condition. Here is a patch for that.

          Show
          Robert Muir added a comment - Thats a good point Simon, i think its going to be a pain to deal with this stupid stale files map (to really fix the stupid leniency) But as a start, we should fix fsync to not create new zero byte files under any condition. Here is a patch for that.
          Hide
          Michael McCandless added a comment -

          +1, new patch looks great; I agree we can decouple the two issues.

          Show
          Michael McCandless added a comment - +1, new patch looks great; I agree we can decouple the two issues.
          Hide
          Robert Muir added a comment -

          I'll make a followup issue for the stale files map: I think its bogus for a number of reasons, and IW should track this efficiently. But its somewhat unrelated.

          Show
          Robert Muir added a comment - I'll make a followup issue for the stale files map: I think its bogus for a number of reasons, and IW should track this efficiently. But its somewhat unrelated.
          Hide
          Uwe Schindler added a comment -

          after looking on the bug that lead to this issue I'd have appreciated to get a FNF exception rather than 0-byte files to begin with

          Which bug?

          About the patch: Looks fine. I am just a little bit sceptical if a sync() on a readonly file really does what the javadocs say (on all operating systems...). Can we test this somehow without using the power-switches of our computers? Maybe write a large file, sync it and while doing that check some /proc or /sys file to monitor how many unwritten buffers are there?

          Show
          Uwe Schindler added a comment - after looking on the bug that lead to this issue I'd have appreciated to get a FNF exception rather than 0-byte files to begin with Which bug? About the patch: Looks fine. I am just a little bit sceptical if a sync() on a readonly file really does what the javadocs say (on all operating systems...). Can we test this somehow without using the power-switches of our computers? Maybe write a large file, sync it and while doing that check some /proc or /sys file to monitor how many unwritten buffers are there?
          Hide
          Robert Muir added a comment -

          I am just a little bit sceptical if a sync() on a readonly file really does what the javadocs say (on all operating systems...).

          I don't think thats an issue i need to test for: thats a jvm bug if it doesnt happen according to the javadocs.

          Show
          Robert Muir added a comment - I am just a little bit sceptical if a sync() on a readonly file really does what the javadocs say (on all operating systems...). I don't think thats an issue i need to test for: thats a jvm bug if it doesnt happen according to the javadocs.
          Hide
          Simon Willnauer added a comment -

          seems like we prepare for 4.7.2 I think this one should be there as well at least for the RW part?

          Show
          Simon Willnauer added a comment - seems like we prepare for 4.7.2 I think this one should be there as well at least for the RW part?
          Hide
          Robert Muir added a comment -

          Upon researching, I think we should address Uwe's concern. So I think we should change the patch to do:

          fFileChannel.open(f, StandardOpenOption.WRITE, StandardOpenOption.APPEND);
          

          But we cannot do this in 4.7.x (needs java7 apis). So if we want to backport we should do something else there.

          Show
          Robert Muir added a comment - Upon researching, I think we should address Uwe's concern. So I think we should change the patch to do: fFileChannel.open(f, StandardOpenOption.WRITE, StandardOpenOption.APPEND); But we cannot do this in 4.7.x (needs java7 apis). So if we want to backport we should do something else there.
          Hide
          Robert Muir added a comment -

          here is the updated patch. its unclear to me if we can do anything for 4.7.x (because of java6), but i think we should fix this for 4.8/5.0

          Show
          Robert Muir added a comment - here is the updated patch. its unclear to me if we can do anything for 4.7.x (because of java6), but i think we should fix this for 4.8/5.0
          Hide
          Uwe Schindler added a comment -

          Patch looks fine, only the APPEND is not needed:

          FileChannel.open(fullFile.toPath(), StandardOpenOption.WRITE);
          

          WRITE does not zero the file, to empty it you need to use TRUNCATE_EXISTING or CREATE.
          APPEND is just there to initially set the file pointer.

          But in any case, +1 to commit.

          We can only use this with Lucene 4.7.x if we use crazy reflection and detect Java 7. Another approach for Java 6 (4.7.x) would be to use a "non-atomic" check to first try to open the file for read and if that works, close and reopen RW - both as RandomAccessFile).

          Show
          Uwe Schindler added a comment - Patch looks fine, only the APPEND is not needed: FileChannel.open(fullFile.toPath(), StandardOpenOption.WRITE); WRITE does not zero the file, to empty it you need to use TRUNCATE_EXISTING or CREATE. APPEND is just there to initially set the file pointer. But in any case, +1 to commit. We can only use this with Lucene 4.7.x if we use crazy reflection and detect Java 7. Another approach for Java 6 (4.7.x) would be to use a "non-atomic" check to first try to open the file for read and if that works, close and reopen RW - both as RandomAccessFile).
          Hide
          Michael McCandless added a comment -

          Patch looks good.

          Another approach for Java 6 (4.7.x) would be to use a "non-atomic" check to first try to open the file for read and if that works, close and reopen RW - both as RandomAccessFile).

          +1

          I think the non-atomicness is fine here.

          Show
          Michael McCandless added a comment - Patch looks good. Another approach for Java 6 (4.7.x) would be to use a "non-atomic" check to first try to open the file for read and if that works, close and reopen RW - both as RandomAccessFile). +1 I think the non-atomicness is fine here.
          Hide
          Uwe Schindler added a comment -

          One more addition: I am not sure if the whole workflow, Lucene uses to force syncing are actually working on all operating systems like we have them. The javadocs of FleChannel and FileDescriptor only say that all changes made with this file descriptor resp. this FileChannel are written to disk:

          If this channel's file resides on a local storage device then when this method returns it is guaranteed that all changes made to the file since this channel was created, or since this method was last invoked, will have been written to that device. This is useful for ensuring that critical information is not lost in the event of a system crash.

          This method is only guaranteed to force changes that were made to this channel's file via the methods defined in this class. It may or may not force changes that were made by modifying the content of a mapped byte buffer obtained by invoking the map method. Invoking the force method of the mapped byte buffer will force changes made to the buffer's content to be written.

          In my opinion, the sync() should be done by the IndexOutput (e.g. before closing). But thats another issue to solve.

          Show
          Uwe Schindler added a comment - One more addition: I am not sure if the whole workflow, Lucene uses to force syncing are actually working on all operating systems like we have them. The javadocs of FleChannel and FileDescriptor only say that all changes made with this file descriptor resp. this FileChannel are written to disk: If this channel's file resides on a local storage device then when this method returns it is guaranteed that all changes made to the file since this channel was created, or since this method was last invoked, will have been written to that device. This is useful for ensuring that critical information is not lost in the event of a system crash. This method is only guaranteed to force changes that were made to this channel's file via the methods defined in this class. It may or may not force changes that were made by modifying the content of a mapped byte buffer obtained by invoking the map method. Invoking the force method of the mapped byte buffer will force changes made to the buffer's content to be written. In my opinion, the sync() should be done by the IndexOutput (e.g. before closing). But thats another issue to solve.
          Hide
          ASF subversion and git services added a comment -

          Commit 1584860 from rmuir@apache.org in branch 'dev/trunk'
          [ https://svn.apache.org/r1584860 ]

          LUCENE-5570: don't let fsync create new zero-byte files

          Show
          ASF subversion and git services added a comment - Commit 1584860 from rmuir@apache.org in branch 'dev/trunk' [ https://svn.apache.org/r1584860 ] LUCENE-5570 : don't let fsync create new zero-byte files
          Hide
          Michael McCandless added a comment -

          In my opinion, the sync() should be done by the IndexOutput (e.g. before closing). But thats another issue to solve.

          +1, we are on thin ice today. See LUCENE-3237 for some earlier discussion about this.

          Also, we should sync the directory as well...

          Show
          Michael McCandless added a comment - In my opinion, the sync() should be done by the IndexOutput (e.g. before closing). But thats another issue to solve. +1, we are on thin ice today. See LUCENE-3237 for some earlier discussion about this. Also, we should sync the directory as well...
          Hide
          Robert Muir added a comment -

          I think the non-atomicness is fine here.

          Ok ill make a 4.7.2 patch. The whole method isnt atomic anyway, it takes Collection of files and must do them one by one

          Show
          Robert Muir added a comment - I think the non-atomicness is fine here. Ok ill make a 4.7.2 patch. The whole method isnt atomic anyway, it takes Collection of files and must do them one by one
          Hide
          ASF subversion and git services added a comment -

          Commit 1584861 from rmuir@apache.org in branch 'dev/branches/branch_4x'
          [ https://svn.apache.org/r1584861 ]

          LUCENE-5570: don't let fsync create new zero-byte files

          Show
          ASF subversion and git services added a comment - Commit 1584861 from rmuir@apache.org in branch 'dev/branches/branch_4x' [ https://svn.apache.org/r1584861 ] LUCENE-5570 : don't let fsync create new zero-byte files
          Hide
          Robert Muir added a comment -

          here is the java6 backport

          Show
          Robert Muir added a comment - here is the java6 backport
          Hide
          Michael McCandless added a comment -

          +1 for java6 version.

          Show
          Michael McCandless added a comment - +1 for java6 version.
          Hide
          Simon Willnauer added a comment -

          +1 LGTM

          Show
          Simon Willnauer added a comment - +1 LGTM
          Hide
          Uwe Schindler added a comment -

          The whole method isnt atomic anyway, it takes Collection of files and must do them one by one

          This was not meant to be atomic for al files. The issue here is: It is still possible to create an empty file: if the RAF readonly open is done successfully, then another thread deletes the file, before the RandomAccessFile is opened for write.

          I am not sure if we should backport this hack to 4.7 branch. We should better release Lucene 4.8 on Java 7 and leave 4.7 as it is. The bug is not causing data corrumption, it just confuses if you are debugging strange things like issue LUCENE-5574.

          Show
          Uwe Schindler added a comment - The whole method isnt atomic anyway, it takes Collection of files and must do them one by one This was not meant to be atomic for al files. The issue here is: It is still possible to create an empty file: if the RAF readonly open is done successfully, then another thread deletes the file, before the RandomAccessFile is opened for write. I am not sure if we should backport this hack to 4.7 branch. We should better release Lucene 4.8 on Java 7 and leave 4.7 as it is. The bug is not causing data corrumption, it just confuses if you are debugging strange things like issue LUCENE-5574 .
          Hide
          Robert Muir added a comment -

          Ok, I am fine with waiting for 4.8 . But this best effort patch is fine imo for the record. I don't want to do any sneakiness to meet some theoretical perfection for java 6. This is just a very practical thing, especially if you are the one scratching your head at said zero byte files.

          Lets leave 4.7 branch totally broken instead. Fine by me.

          Show
          Robert Muir added a comment - Ok, I am fine with waiting for 4.8 . But this best effort patch is fine imo for the record. I don't want to do any sneakiness to meet some theoretical perfection for java 6. This is just a very practical thing, especially if you are the one scratching your head at said zero byte files. Lets leave 4.7 branch totally broken instead. Fine by me.
          Hide
          Michael McCandless added a comment -

          I think this is worth fixing for 4.7.2; that esoteric case that could bring an empty file into existence won't matter in practice, and the confusion when IW unexpectedly brings 0 byte files into existence is really confusing.

          Show
          Michael McCandless added a comment - I think this is worth fixing for 4.7.2; that esoteric case that could bring an empty file into existence won't matter in practice, and the confusion when IW unexpectedly brings 0 byte files into existence is really confusing.
          Hide
          Uwe Schindler added a comment - - edited

          Hi, because Simon Willnauer asked me: The javadocs of FileChannel#force(boolean) are not so nice, because they only garant that the changes made by this FileDescriptor are written to disk. In fact this is correct and also affects RandomAccessFile, although the javadocs are not so explicit (Javadocs of FileDescriptor class only talk about buffers owned by "this FD", but this is equivalent).

          In fact, if you check the native C++ source code of OpenJDK, in fact, FileDescriptor#sync() and FileChannel#force(true) call exactly the same sycall: http://linux.die.net/man/2/fsync

          fsync() transfers ("flushes") all modified in-core data of (i.e., modified buffer cache pages for) the file referred to by the file descriptor fd to the disk device (or other permanent storage device) so that all changed information can be retrieved even after the system crashed or was rebooted. This includes writing through or flushing a disk cache if present. The call blocks until the device reports that the transfer has completed. It also flushes metadata information associated with the file (see stat(2)).

          Calling fsync() does not necessarily ensure that the entry in the directory containing the file has also reached disk. For that an explicit fsync() on a file descriptor for the directory is also needed.

          FileChannel#force(false) just calls fdatasync(), also explained on this man page. On Windows, it does the same like FileChannel#force(true).

          Unless other JVMs implement this is a completely different way, both code paths are identical. In addition, the man page of fsync also states:

          On some UNIX systems (but not Linux), fd must be a writable file descriptor.

          This is why we need to open for write and because of that we can only create 0 byte files in Java 6, unless we check for existence before.

          Please also note the following statement:

          Calling fsync() does not necessarily ensure that the entry in the directory containing the file has also reached disk. For that an explicit fsync() on a file descriptor for the directory is also needed.

          As far as I remember, in Java 7 we can also flush the directory, we should try this - if it works!

          Addition: On Windows, all code in JNI impls for java.io and java.nio is finally delegated to FlushFileBuffers(HANDLE): http://msdn.microsoft.com/en-us/library/windows/desktop/aa364439(v=vs.85).aspx. The windows kernel32.dll syscall also requires that the file is open for write.

          Show
          Uwe Schindler added a comment - - edited Hi, because Simon Willnauer asked me: The javadocs of FileChannel#force(boolean) are not so nice, because they only garant that the changes made by this FileDescriptor are written to disk. In fact this is correct and also affects RandomAccessFile, although the javadocs are not so explicit (Javadocs of FileDescriptor class only talk about buffers owned by "this FD", but this is equivalent). In fact, if you check the native C++ source code of OpenJDK, in fact, FileDescriptor#sync() and FileChannel#force(true) call exactly the same sycall: http://linux.die.net/man/2/fsync fsync() transfers ("flushes") all modified in-core data of (i.e., modified buffer cache pages for) the file referred to by the file descriptor fd to the disk device (or other permanent storage device) so that all changed information can be retrieved even after the system crashed or was rebooted. This includes writing through or flushing a disk cache if present. The call blocks until the device reports that the transfer has completed. It also flushes metadata information associated with the file (see stat(2)). Calling fsync() does not necessarily ensure that the entry in the directory containing the file has also reached disk. For that an explicit fsync() on a file descriptor for the directory is also needed. FileChannel#force(false) just calls fdatasync() , also explained on this man page. On Windows, it does the same like FileChannel#force(true) . Unless other JVMs implement this is a completely different way, both code paths are identical. In addition, the man page of fsync also states: On some UNIX systems (but not Linux), fd must be a writable file descriptor. This is why we need to open for write and because of that we can only create 0 byte files in Java 6, unless we check for existence before. Please also note the following statement: Calling fsync() does not necessarily ensure that the entry in the directory containing the file has also reached disk. For that an explicit fsync() on a file descriptor for the directory is also needed. As far as I remember, in Java 7 we can also flush the directory, we should try this - if it works! Addition: On Windows, all code in JNI impls for java.io and java.nio is finally delegated to FlushFileBuffers(HANDLE) : http://msdn.microsoft.com/en-us/library/windows/desktop/aa364439(v=vs.85).aspx . The windows kernel32.dll syscall also requires that the file is open for write.
          Hide
          ASF subversion and git services added a comment -

          Commit 1586165 from mikemccand@apache.org in branch 'dev/branches/lucene_solr_4_7'
          [ https://svn.apache.org/r1586165 ]

          LUCENE-5570: don't let Directory.fsync create new 0-byte files

          Show
          ASF subversion and git services added a comment - Commit 1586165 from mikemccand@apache.org in branch 'dev/branches/lucene_solr_4_7' [ https://svn.apache.org/r1586165 ] LUCENE-5570 : don't let Directory.fsync create new 0-byte files
          Hide
          Michael McCandless added a comment -

          OK I committed Rob's last patch to 4.7.x branch, so if we spin a new RC / do another bug fix release, this bug will be fixed.

          Show
          Michael McCandless added a comment - OK I committed Rob's last patch to 4.7.x branch, so if we spin a new RC / do another bug fix release, this bug will be fixed.
          Hide
          Uwe Schindler added a comment -

          I reopened that issue to correctly close it with right status and add the correct fix version (it was closed as "Won't fix"!?)

          Show
          Uwe Schindler added a comment - I reopened that issue to correctly close it with right status and add the correct fix version (it was closed as "Won't fix"!?)
          Hide
          Uwe Schindler added a comment -

          Close issue after release of 4.8.0

          Show
          Uwe Schindler added a comment - Close issue after release of 4.8.0

            People

            • Assignee:
              Unassigned
              Reporter:
              Robert Muir
            • Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development