Lucene - Core
  1. Lucene - Core
  2. LUCENE-4848

Fix Directory implementations to use NIO2 APIs

    Details

    • Type: Task Task
    • Status: Resolved
    • Priority: Minor Minor
    • Resolution: Fixed
    • Affects Version/s: 6.0
    • Fix Version/s: 4.8, 6.0
    • Component/s: None
    • Labels:
    • Lucene Fields:
      New, Patch Available

      Description

      I have implemented 3 Directory subclasses using NIO2 API's (available on JDK7). These may be suitable for inclusion in a Lucene contrib module.

      See the mailing list at http://lucene.markmail.org/thread/lrv7miivzmjm3ml5 for more details about this code and the advantages it provides.

      The code is attached as a zip to this issue. I'll be happy to make any changes requested. I've included some minimal smoke tests, but any help in how to use the normal Lucene tests to perform more thorough testing would be appreciated.

      1. jdk7directory.zip
        24 kB
        Michael Poindexter
      2. LUCENE-4848.patch
        16 kB
        Uwe Schindler
      3. LUCENE-4848.patch
        15 kB
        Uwe Schindler
      4. LUCENE-4848.patch
        18 kB
        Michael Poindexter
      5. LUCENE-4848.patch
        14 kB
        Uwe Schindler
      6. LUCENE-4848.patch
        34 kB
        Uwe Schindler
      7. LUCENE-4848.patch
        32 kB
        Michael Poindexter
      8. LUCENE-4848.patch
        27 kB
        Uwe Schindler
      9. LUCENE-4848.patch
        25 kB
        Michael Poindexter
      10. LUCENE-4848.patch.txt
        30 kB
        Dawid Weiss
      11. LUCENE-4848-MMapDirectory.patch
        3 kB
        Uwe Schindler

        Issue Links

          Activity

          Hide
          Uwe Schindler added a comment -

          Hi Michael,
          thank you very much for the directory implementations.

          I already thought about that a while before, but I was not aware, that the WindowsFileChannelFactory (http://www.docjar.com/html/api/sun/nio/fs/WindowsChannelFactory.java.html) automatically passes FILE_SHARE_DELETE by default to the windows open syscall. So this is indeed an improvement, because it allows to delete files that are opened like that. I have not yet verified this, but I trust you and will take it under consideration. But be warned: FILE_SHARE_DELETE does not completely emulate delete-after-last-close semantics from POSIX. It looks like the file keeps valid and disappears from the directory, but in contrast to POSIX, you cannot delete the directory the file is in, nor can you create a new file with the same name as the still open, but deleted one. This should not be an issue for Lucene, as it never recreates files with the same name which it keept open before.

          About your classes in general. The new directory implementation (the Async one), could go into Lucene 5.0, once we moved to Java 7, so this would be a start in the direction on doing the move. We had a discussion on the mailinglist and all committers +1.

          About the NIOFSDirectory and MMapDirectory: Those are 100% clones of the originals and we dont want to have 100% clones of same classes in Lucene core, so I would restructure them. The good thing here is: The really only change in those clones is the part that opens a FileChannel from a file name. And that involves only 3 calls to Java 7 APIs. My idea was to simply insert that code (using reflection for Lucene 4.x into the original MMapDirectory and NIOFSDirectory) into a helper function for opening (File in, FileChannel out) that creates a RAF on Java 6 and directly allocates FileChannel on Java 7 - Problem solved. Introducting new Java 7 symbols like "Path" into Lucene makes no sense at the moment, so its hidden as implementation detail.

          I will provide a patch for NIOFS and MMap later.

          Thanks in any case, this issue brought me to the right track of maybe solving the delete on windows problem - I am still not sure if this really helps and does not break the semantics Lucene expects from the underlying FS.

          Show
          Uwe Schindler added a comment - Hi Michael, thank you very much for the directory implementations. I already thought about that a while before, but I was not aware, that the WindowsFileChannelFactory ( http://www.docjar.com/html/api/sun/nio/fs/WindowsChannelFactory.java.html ) automatically passes FILE_SHARE_DELETE by default to the windows open syscall. So this is indeed an improvement, because it allows to delete files that are opened like that. I have not yet verified this, but I trust you and will take it under consideration. But be warned: FILE_SHARE_DELETE does not completely emulate delete-after-last-close semantics from POSIX. It looks like the file keeps valid and disappears from the directory, but in contrast to POSIX, you cannot delete the directory the file is in, nor can you create a new file with the same name as the still open, but deleted one. This should not be an issue for Lucene, as it never recreates files with the same name which it keept open before. About your classes in general. The new directory implementation (the Async one), could go into Lucene 5.0, once we moved to Java 7, so this would be a start in the direction on doing the move. We had a discussion on the mailinglist and all committers +1. About the NIOFSDirectory and MMapDirectory: Those are 100% clones of the originals and we dont want to have 100% clones of same classes in Lucene core, so I would restructure them. The good thing here is: The really only change in those clones is the part that opens a FileChannel from a file name. And that involves only 3 calls to Java 7 APIs. My idea was to simply insert that code (using reflection for Lucene 4.x into the original MMapDirectory and NIOFSDirectory) into a helper function for opening (File in, FileChannel out) that creates a RAF on Java 6 and directly allocates FileChannel on Java 7 - Problem solved. Introducting new Java 7 symbols like "Path" into Lucene makes no sense at the moment, so its hidden as implementation detail. I will provide a patch for NIOFS and MMap later. Thanks in any case, this issue brought me to the right track of maybe solving the delete on windows problem - I am still not sure if this really helps and does not break the semantics Lucene expects from the underlying FS.
          Hide
          Uwe Schindler added a comment -

          These may be suitable for inclusion in a Lucene contrib module.

          There are no more cntrib modules and we want all normal modules use the same JDK version, as we had major problems for the release manager to build and test the realease when multiple JDK versions are involved. So this can only go into Lucene 5.0 (which will use Java 7). Parts of if MMap and NIOFS changes may go into Lucene 4, too - as the changes are minimal and could be solved by reflection (see above).

          Show
          Uwe Schindler added a comment - These may be suitable for inclusion in a Lucene contrib module. There are no more cntrib modules and we want all normal modules use the same JDK version, as we had major problems for the release manager to build and test the realease when multiple JDK versions are involved. So this can only go into Lucene 5.0 (which will use Java 7). Parts of if MMap and NIOFS changes may go into Lucene 4, too - as the changes are minimal and could be solved by reflection (see above).
          Hide
          Uwe Schindler added a comment -

          Hi Michael, one request to you:

          Would it be possible to create a patch that does not add new classes prefixed with "JDK7" (which should be "Java7" btw), but instead directly modify the existing classes in Lucene 5 (aka trunk)? Of course the new Async variant would be a new class, but for includion in Lucene 5, a real patch without code duplication would be fine. And please try to keep the "logic" changes as minimal as possible in the existing classes.

          I am working on MMapDirectory at the moment to make a patch that also works with Lucene 4 using reflection.

          Show
          Uwe Schindler added a comment - Hi Michael, one request to you: Would it be possible to create a patch that does not add new classes prefixed with "JDK7" (which should be "Java7" btw), but instead directly modify the existing classes in Lucene 5 (aka trunk)? Of course the new Async variant would be a new class, but for includion in Lucene 5, a real patch without code duplication would be fine. And please try to keep the "logic" changes as minimal as possible in the existing classes. I am working on MMapDirectory at the moment to make a patch that also works with Lucene 4 using reflection.
          Hide
          Michael Poindexter added a comment -

          No problem, I'll produce a patch against trunk that just changes the existing directory implementations as little as possible.

          Two questions:
          1.) Since this changes the file writing behavior for NIOFSDirectory and MMapDirectory (writes can now throw a ClosedChannelException if the thread is interrupted, where I believe they couldn't before) should the changes be controllable via a flag? Or should I just not change how writes are done for these two classes (since it shouldn't be necessary to delete a file while it is open for write, only if it is open read)

          2.) I was using a Path instead of a File internally to represent the directory location. This is somewhat nice in that it works with the Java 7 pluggable filesystems implementation (i.e. to zip up an index one could just use the zip filesystem provider with a directory and then do a Directory.copyTo). I assume you want to not add a dependency on using a Path since that would change the return type of FSDirectory.getDirectory()?

          Show
          Michael Poindexter added a comment - No problem, I'll produce a patch against trunk that just changes the existing directory implementations as little as possible. Two questions: 1.) Since this changes the file writing behavior for NIOFSDirectory and MMapDirectory (writes can now throw a ClosedChannelException if the thread is interrupted, where I believe they couldn't before) should the changes be controllable via a flag? Or should I just not change how writes are done for these two classes (since it shouldn't be necessary to delete a file while it is open for write, only if it is open read) 2.) I was using a Path instead of a File internally to represent the directory location. This is somewhat nice in that it works with the Java 7 pluggable filesystems implementation (i.e. to zip up an index one could just use the zip filesystem provider with a directory and then do a Directory.copyTo). I assume you want to not add a dependency on using a Path since that would change the return type of FSDirectory.getDirectory()?
          Hide
          Uwe Schindler added a comment -

          About 2: I am fine with using Path. When we are on Java 7, Path is fine to hold the pointer to the directory. Of course FSDirectory could add another ctor, but not replace the File-based ones. I think you current code does the right thing.

          About writing: I know, you changed the whole FSDirectory base class to use Channel. Maybe keep the base class mostly as it is (with a generic "descriptor parameter", that can be a RAF or Channel). I would prefer to make writing for now use the RAF as before, but provide a channel-based impl, too?

          About 1: The problem with interruptions is a bigger one - we should avoid that any Directory implementation in Lucene is reacting to interruptions and produce failures. We had lots of bug reports (regards a bug in Sun's original implementation, that auto-closed a channel when interrupted). So in general, interrupted file io in lucene should be repeated.

          I will now setup Java 7 build for trunk and after short confirmation from the mailing lists, I will move Lucene trunk's build to require Java 7!

          Show
          Uwe Schindler added a comment - About 2: I am fine with using Path. When we are on Java 7, Path is fine to hold the pointer to the directory. Of course FSDirectory could add another ctor, but not replace the File-based ones. I think you current code does the right thing. About writing: I know, you changed the whole FSDirectory base class to use Channel. Maybe keep the base class mostly as it is (with a generic "descriptor parameter", that can be a RAF or Channel). I would prefer to make writing for now use the RAF as before, but provide a channel-based impl, too? About 1: The problem with interruptions is a bigger one - we should avoid that any Directory implementation in Lucene is reacting to interruptions and produce failures. We had lots of bug reports (regards a bug in Sun's original implementation, that auto-closed a channel when interrupted). So in general, interrupted file io in lucene should be repeated. I will now setup Java 7 build for trunk and after short confirmation from the mailing lists, I will move Lucene trunk's build to require Java 7!
          Hide
          Michael Poindexter added a comment -

          OK, I will use Path, and produce 2 IndexInputs, one that uses a RAF (used by default), and another that uses a Channel that share most of their code (I will just do one super class with hooks for every place that depends on the actual "descriptor parameter" and two subclasses).

          I'll see if there is existing code in 5.x that is calling FSDirectory.getDirectory() that depends on that being a File. If so, I will not change it to return a Path, but rather introduce a new method getDirectoryPath().

          I think it's not a bug that the channel is auto-closed when interrupted, but rather documented behavior (in InterruptibleChannel). Not trying to nitpick, but just point out that this behavior is unlikely to change in the future since it's how it's intended to work.

          We could pretty easily retry when we get a ClosedByInterruptException, but do you want that to happen as part of this patch or as another issue? I think maybe that should go in as a separate patch since I think to make it work properly you would have to make FSIndexInput.clone() open a new FileChannel (essentially duplicating the file descriptor per-thread...that way when the descriptor is closed due to an interrupt you only have to worry about reopening that thread's FD, not all threads sharing the same FD).

          Show
          Michael Poindexter added a comment - OK, I will use Path, and produce 2 IndexInputs, one that uses a RAF (used by default), and another that uses a Channel that share most of their code (I will just do one super class with hooks for every place that depends on the actual "descriptor parameter" and two subclasses). I'll see if there is existing code in 5.x that is calling FSDirectory.getDirectory() that depends on that being a File. If so, I will not change it to return a Path, but rather introduce a new method getDirectoryPath(). I think it's not a bug that the channel is auto-closed when interrupted, but rather documented behavior (in InterruptibleChannel). Not trying to nitpick, but just point out that this behavior is unlikely to change in the future since it's how it's intended to work. We could pretty easily retry when we get a ClosedByInterruptException, but do you want that to happen as part of this patch or as another issue? I think maybe that should go in as a separate patch since I think to make it work properly you would have to make FSIndexInput.clone() open a new FileChannel (essentially duplicating the file descriptor per-thread...that way when the descriptor is closed due to an interrupt you only have to worry about reopening that thread's FD, not all threads sharing the same FD).
          Hide
          Robert Muir added a comment -

          We could pretty easily retry when we get a ClosedByInterruptException, but do you want that to happen as part of this patch or as another issue?

          please no!

          Show
          Robert Muir added a comment - We could pretty easily retry when we get a ClosedByInterruptException, but do you want that to happen as part of this patch or as another issue? please no!
          Hide
          Uwe Schindler added a comment -

          Same here. In my opinion writing should use RAF as this is most compatible.

          Show
          Uwe Schindler added a comment - Same here. In my opinion writing should use RAF as this is most compatible.
          Hide
          Uwe Schindler added a comment -

          I think maybe that should go in as a separate patch since I think to make it work properly you would have to make FSIndexInput.clone() open a new FileChannel (essentially duplicating the file descriptor per-thread...that way when the descriptor is closed due to an interrupt you only have to worry about reopening that thread's FD, not all threads sharing the same FD).

          We cannot do this, as Lucene never closes clones. This would be the grave for file handles! Lucene would eat up all file handles in milliseconds

          Show
          Uwe Schindler added a comment - I think maybe that should go in as a separate patch since I think to make it work properly you would have to make FSIndexInput.clone() open a new FileChannel (essentially duplicating the file descriptor per-thread...that way when the descriptor is closed due to an interrupt you only have to worry about reopening that thread's FD, not all threads sharing the same FD). We cannot do this, as Lucene never closes clones. This would be the grave for file handles! Lucene would eat up all file handles in milliseconds
          Hide
          Michael Poindexter added a comment -

          That would be bad I was under the impression that clones were actually closed, but the close method just checked if it was a clone and if so didn't actually do anything. Thanks for pointing this out.

          In that case there's really not much that can be done to avoid ClosedByInterruptExceptions. We have one FD that's shared across threads, the JDK closed it, and it we were to open a new one there's no place to release the resource. IMO, this would indicate that perhaps clones should in fact be closed, but I don't know enough about why they are not to have a good opinion

          Show
          Michael Poindexter added a comment - That would be bad I was under the impression that clones were actually closed, but the close method just checked if it was a clone and if so didn't actually do anything. Thanks for pointing this out. In that case there's really not much that can be done to avoid ClosedByInterruptExceptions. We have one FD that's shared across threads, the JDK closed it, and it we were to open a new one there's no place to release the resource. IMO, this would indicate that perhaps clones should in fact be closed, but I don't know enough about why they are not to have a good opinion
          Hide
          Robert Muir added a comment -

          I would be against it myself. It seems users cannot even manage open/close on IndexReader and IndexWriter today. So its too much that they would have to close() scorers, docsenum, and so on.

          Personally i'm not worried about ClosedByInterruptExceptions: just dont interrupt threads doing searches. I'm also not willing to pay the cost of additional file handles if i'm not interrupt()'ing...why should i?

          But to me this is all unrelated to this issue, its been discussed over and over elsewhere and the problem already exists today.

          Show
          Robert Muir added a comment - I would be against it myself. It seems users cannot even manage open/close on IndexReader and IndexWriter today. So its too much that they would have to close() scorers, docsenum, and so on. Personally i'm not worried about ClosedByInterruptExceptions: just dont interrupt threads doing searches. I'm also not willing to pay the cost of additional file handles if i'm not interrupt()'ing...why should i? But to me this is all unrelated to this issue, its been discussed over and over elsewhere and the problem already exists today.
          Hide
          Uwe Schindler added a comment - - edited

          You see.

          So I wait for a minimal patch! I just want as a first step:

          • minimal changes (no changes at all to SimpleFSDir)
          • MMapDir changes are the simpliest
          • NIOFSDir need more changes, because it curretntly relies on FSDir's stupid RAF (Robert Muir already has a patch to not rely on RAF in NIOFSDir already), have to lookup the issue
          • Only use Path in the impl details for now -> more changes should be separate!
          • Add a separate new class for the fake-ASYNC one
          Show
          Uwe Schindler added a comment - - edited You see. So I wait for a minimal patch! I just want as a first step: minimal changes (no changes at all to SimpleFSDir) MMapDir changes are the simpliest NIOFSDir need more changes, because it curretntly relies on FSDir's stupid RAF (Robert Muir already has a patch to not rely on RAF in NIOFSDir already), have to lookup the issue Only use Path in the impl details for now -> more changes should be separate! Add a separate new class for the fake-ASYNC one
          Hide
          Uwe Schindler added a comment -

          For demonstartation puposes, I attached the simple patch for MMapDirectory that uses the new StandardOpenMode and FileChannel.open() provided by Java 7. I did not yet really test the deletion of open files on windows, but all tests pass (as they should).

          It would also be interesting if this patch maybe solves the ClosedChannelException problem on interrupt? The time window in MMap is very short that the bug can happen (only after opening the channel, while mmap is doing its work before the channel is closed).

          As you see, the Path API of Java 7 is not yet exposed to the public API. The whole code is still working with java.io.File, only when opening the channel it calls File.toPath().

          Michael Poindexter: We should do the same and no other changes in NIO. Just move away from RAF and use FileChannel.

          Show
          Uwe Schindler added a comment - For demonstartation puposes, I attached the simple patch for MMapDirectory that uses the new StandardOpenMode and FileChannel.open() provided by Java 7. I did not yet really test the deletion of open files on windows, but all tests pass (as they should). It would also be interesting if this patch maybe solves the ClosedChannelException problem on interrupt? The time window in MMap is very short that the bug can happen (only after opening the channel, while mmap is doing its work before the channel is closed). As you see, the Path API of Java 7 is not yet exposed to the public API. The whole code is still working with java.io.File, only when opening the channel it calls File.toPath(). Michael Poindexter: We should do the same and no other changes in NIO. Just move away from RAF and use FileChannel.
          Hide
          Robert Muir added a comment -

          For demonstartation puposes, I attached the simple patch for MMapDirectory that uses the new StandardOpenMode and FileChannel.open() provided by Java 7. I did not yet really test the deletion of open files on windows, but all tests pass (as they should).

          This patch looks great!

          Show
          Robert Muir added a comment - For demonstartation puposes, I attached the simple patch for MMapDirectory that uses the new StandardOpenMode and FileChannel.open() provided by Java 7. I did not yet really test the deletion of open files on windows, but all tests pass (as they should). This patch looks great!
          Hide
          Michael Poindexter added a comment -

          Thanks for the demonstration Uwe! It was very helpful as I misunderstood our earlier conversation and was attempting to change the internals of FSDirectory to use Path (instead of File) while keeping the public interface the same (actually, I was done, but waiting for the tests to run before attaching the patch, so your timing was perfect )

          I've attached a patch in the same spirit as your MMapDirectory patch that makes some minor changes to FSDirectory to allow different FSIndexInput and FSIndexOutput subclasses that use different methods of accessing the file (i.e. RandomAccessFile vs. FileChannel). It updates MMapDirectory, SimpleFSDirectory and NIOFSDirectory to use appropriate subclasses, and adds a new AsyncFSDirectory class.

          Show
          Michael Poindexter added a comment - Thanks for the demonstration Uwe! It was very helpful as I misunderstood our earlier conversation and was attempting to change the internals of FSDirectory to use Path (instead of File) while keeping the public interface the same (actually, I was done, but waiting for the tests to run before attaching the patch, so your timing was perfect ) I've attached a patch in the same spirit as your MMapDirectory patch that makes some minor changes to FSDirectory to allow different FSIndexInput and FSIndexOutput subclasses that use different methods of accessing the file (i.e. RandomAccessFile vs. FileChannel). It updates MMapDirectory, SimpleFSDirectory and NIOFSDirectory to use appropriate subclasses, and adds a new AsyncFSDirectory class.
          Hide
          Uwe Schindler added a comment -

          Thanks for the demonstration Uwe! It was very helpful as I misunderstood our earlier conversation and was attempting to change the internals of FSDirectory to use Path (instead of File) while keeping the public interface the same (actually, I was done, but waiting for the tests to run before attaching the patch, so your timing was perfect )

          We can move to Path later, but before doing that we should get this in as a first step. This issue is unrelated.

          I just skimmed your patch, this looks quite good. I have to look closer into it, will report back later. I have seen that you almost completely reused my patch - thanks! But you used try-with-resources to open,mmap,close the channel - nice!

          To run all Lucene+SOLR tests with a specific directory implementation use e.g.: "ant test -Dtests.directory=MMapDirectory", otheriwse Lucene uses in most cases RAMDirectory and only rarely other ones. By that you should also be able to test your new directory (it might be needed that you add a hook for instantiating it inside LuceneTestCase where -Dtests.directory is parsed).

          Show
          Uwe Schindler added a comment - Thanks for the demonstration Uwe! It was very helpful as I misunderstood our earlier conversation and was attempting to change the internals of FSDirectory to use Path (instead of File) while keeping the public interface the same (actually, I was done, but waiting for the tests to run before attaching the patch, so your timing was perfect ) We can move to Path later, but before doing that we should get this in as a first step. This issue is unrelated. I just skimmed your patch, this looks quite good. I have to look closer into it, will report back later. I have seen that you almost completely reused my patch - thanks! But you used try-with-resources to open,mmap,close the channel - nice! To run all Lucene+SOLR tests with a specific directory implementation use e.g.: "ant test -Dtests.directory=MMapDirectory", otheriwse Lucene uses in most cases RAMDirectory and only rarely other ones. By that you should also be able to test your new directory (it might be needed that you add a hook for instantiating it inside LuceneTestCase where -Dtests.directory is parsed).
          Hide
          Uwe Schindler added a comment -

          One small thing: The protected method FSIndexInput#length() does not need the generic FD, it should be parameterless? The FD is known to the subclass, isnt it?

          Show
          Uwe Schindler added a comment - One small thing: The protected method FSIndexInput#length() does not need the generic FD, it should be parameterless? The FD is known to the subclass, isnt it?
          Hide
          Michael Poindexter added a comment - - edited

          One small thing: The protected method FSIndexInput#length() does not need the generic FD, it should be parameterless? The FD is known to the subclass, isnt it?

          2 reasons not to:
          1.) I think there is already a parameterless length() method that behaves slightly differently. This length(T) is intended to extract the full length from the file accessor, while length() returns the configured length of the slice.
          2.) It is called from the constructor, so it might be considered bad practice to access member variables since that can be error prone.

          It might be good to rename this to fileLength(T) or something similar.

          Show
          Michael Poindexter added a comment - - edited One small thing: The protected method FSIndexInput#length() does not need the generic FD, it should be parameterless? The FD is known to the subclass, isnt it? 2 reasons not to: 1.) I think there is already a parameterless length() method that behaves slightly differently. This length(T) is intended to extract the full length from the file accessor, while length() returns the configured length of the slice. 2.) It is called from the constructor, so it might be considered bad practice to access member variables since that can be error prone. It might be good to rename this to fileLength(T) or something similar.
          Hide
          Michael Poindexter added a comment -

          It would also be interesting if this patch maybe solves the ClosedChannelException problem on interrupt? The time window in MMap is very short that the bug can happen (only after opening the channel, while mmap is doing its work before the channel is closed).

          I don't think this will change the behavior much at all. Before the patch the channel was only open briefly (just long enough to do the map()), and after the change it is the same.

          Show
          Michael Poindexter added a comment - It would also be interesting if this patch maybe solves the ClosedChannelException problem on interrupt? The time window in MMap is very short that the bug can happen (only after opening the channel, while mmap is doing its work before the channel is closed). I don't think this will change the behavior much at all. Before the patch the channel was only open briefly (just long enough to do the map()), and after the change it is the same.
          Hide
          Robert Muir added a comment -

          I like this patch, thanks Michael!

          Show
          Robert Muir added a comment - I like this patch, thanks Michael!
          Hide
          Uwe Schindler added a comment - - edited

          Hi Michael,

          your new Directory does not pass any test:

          ant test -Dtests.directory=AsyncFSDirectory
          

          This fails with crazy errors. One of them is that it complains about zombie threads (the Lucene test framework does not allow that threads not seen before are running after a test has finished). I think it might be caused by some thread created when the executor=null? But this is not the only error, there are more test failures. Just try it out.

          Show
          Uwe Schindler added a comment - - edited Hi Michael, your new Directory does not pass any test: ant test -Dtests.directory=AsyncFSDirectory This fails with crazy errors. One of them is that it complains about zombie threads (the Lucene test framework does not allow that threads not seen before are running after a test has finished). I think it might be caused by some thread created when the executor=null? But this is not the only error, there are more test failures. Just try it out.
          Hide
          Uwe Schindler added a comment - - edited

          Here a modified patch with some fixes (still does not pass tests), but:

          • renamed the length() method and made it package-protected like the other one that checks FD validity. This method is implementation dependent and should be hidden from public use
          • made the new AsyncFSDirectory be used by tests. To fully test it, you still have to pass -Dtests.directory

          The Channel-based FSIndexOutput is currently unused!

          Show
          Uwe Schindler added a comment - - edited Here a modified patch with some fixes (still does not pass tests), but: renamed the length() method and made it package-protected like the other one that checks FD validity. This method is implementation dependent and should be hidden from public use made the new AsyncFSDirectory be used by tests. To fully test it, you still have to pass -Dtests.directory The Channel-based FSIndexOutput is currently unused!
          Hide
          Uwe Schindler added a comment -

          There is one other problem, caused by the 2 new abstract methods in the base class: the check for FS validity is package proetcted and is only used by Lucene tests. If somebody from outside Lucene code wants to implement a custom FSIndexOutput, it is impossible because the package proetcted abstract methods. Previously this was not a problem, as the method had a method body and the compiler was happy.

          In my opinion, we should also remove the fileLength(FD) method and simply require the file size to be passed in ctor.

          Show
          Uwe Schindler added a comment - There is one other problem, caused by the 2 new abstract methods in the base class: the check for FS validity is package proetcted and is only used by Lucene tests. If somebody from outside Lucene code wants to implement a custom FSIndexOutput, it is impossible because the package proetcted abstract methods. Previously this was not a problem, as the method had a method body and the compiler was happy. In my opinion, we should also remove the fileLength(FD) method and simply require the file size to be passed in ctor.
          Hide
          Michael Poindexter added a comment -

          I'll run the tests tonight with -Dtests.directory for the new directory, I didn't know how to do it before. I think the threads thing is actually the JRE...if you pass null for the executor it means to use the default system thread pool the JRE creates for IO notifications. I'm guessing it is lazily created and the test framework detects and flags this. I'll simply create and use an executor explicitly for the test cases so that there is thread pool that can be cleaned up.

          I know the channel based FSIndexOutput is unused. I can remove it if you'd like, but I wanted to put it in based on our previous conversation simply in case it made sense at some point to use an IndexOuput that did not lock files for delete.

          Regarding the 2 new abstract methods in the base class: I'm fine with making the length be passed in the constructor, which leaves isFDValid(). This was why I had initially made the length method be protected instead of packaged private: so that subclasses could be created outside of the o.a.l.store package. I'd like to just change isFDValid() to be protected. If I do that it will be visible to subclasses and the package, but not publicly available, sound OK?

          Show
          Michael Poindexter added a comment - I'll run the tests tonight with -Dtests.directory for the new directory, I didn't know how to do it before. I think the threads thing is actually the JRE...if you pass null for the executor it means to use the default system thread pool the JRE creates for IO notifications. I'm guessing it is lazily created and the test framework detects and flags this. I'll simply create and use an executor explicitly for the test cases so that there is thread pool that can be cleaned up. I know the channel based FSIndexOutput is unused. I can remove it if you'd like, but I wanted to put it in based on our previous conversation simply in case it made sense at some point to use an IndexOuput that did not lock files for delete. Regarding the 2 new abstract methods in the base class: I'm fine with making the length be passed in the constructor, which leaves isFDValid(). This was why I had initially made the length method be protected instead of packaged private: so that subclasses could be created outside of the o.a.l.store package. I'd like to just change isFDValid() to be protected. If I do that it will be visible to subclasses and the package, but not publicly available, sound OK?
          Hide
          Michael Poindexter added a comment -

          New version of patch. Changes:

          • FSIndexInput length moved to constructor and fileLength() method removed
          • Test harness fixed to account for threads created by JRE causing unit tests to fail.
          Show
          Michael Poindexter added a comment - New version of patch. Changes: FSIndexInput length moved to constructor and fileLength() method removed Test harness fixed to account for threads created by JRE causing unit tests to fail.
          Hide
          Michael Poindexter added a comment -

          I tried to pass a fixed executor to AsyncFSDirectory to solve the test failures, but it seems that the JRE always ends up starting new threads anyway, causing the test suite to fail. I added some logic to LuceneTestCase to ignore these system created threads, and the tests seem to pass.

          Show
          Michael Poindexter added a comment - I tried to pass a fixed executor to AsyncFSDirectory to solve the test failures, but it seems that the JRE always ends up starting new threads anyway, causing the test suite to fail. I added some logic to LuceneTestCase to ignore these system created threads, and the tests seem to pass.
          Hide
          Dawid Weiss added a comment -

          Hi Michael, Uwe pinged me about the problem with async background threads. I'll take a look at your patch later today, ok?

          Show
          Dawid Weiss added a comment - Hi Michael, Uwe pinged me about the problem with async background threads. I'll take a look at your patch later today, ok?
          Hide
          Uwe Schindler added a comment - - edited

          Hi Michael,
          in my previous patch I also added the new Directory implementation into the list of FSDirectories to be used randomly while testing (when tests.directory is not given). This is missing in the latest patch. It should simply be added to the list in LuceneTestCase.java:

             private static final List<String> FS_DIRECTORIES = Arrays.asList(
               "SimpleFSDirectory",
               "NIOFSDirectory",
          -    "MMapDirectory"
          +    "MMapDirectory",
          +    "AsyncFSDirectory"
             );
          

          I was not able to validate the thread exclusion lists, but Dawid will do this. Maybe he has an easier solutions, to me it looks too complicated.

          Show
          Uwe Schindler added a comment - - edited Hi Michael, in my previous patch I also added the new Directory implementation into the list of FSDirectories to be used randomly while testing (when tests.directory is not given). This is missing in the latest patch. It should simply be added to the list in LuceneTestCase.java: private static final List<String> FS_DIRECTORIES = Arrays.asList( "SimpleFSDirectory", "NIOFSDirectory", - "MMapDirectory" + "MMapDirectory", + "AsyncFSDirectory" ); I was not able to validate the thread exclusion lists, but Dawid will do this. Maybe he has an easier solutions, to me it looks too complicated.
          Hide
          Uwe Schindler added a comment -

          Regarding the 2 new abstract methods in the base class: I'm fine with making the length be passed in the constructor, which leaves isFDValid(). This was why I had initially made the length method be protected instead of packaged private: so that subclasses could be created outside of the o.a.l.store package. I'd like to just change isFDValid() to be protected. If I do that it will be visible to subclasses and the package, but not publicly available, sound OK?

          protected is the wrong access flag. Protected should only be used for methods that should be overridded in subclasses, but never called from outside! I would make the isFDValid() method public to be consistent and mark it as @lucene.internal in javadocs.

          Show
          Uwe Schindler added a comment - Regarding the 2 new abstract methods in the base class: I'm fine with making the length be passed in the constructor, which leaves isFDValid(). This was why I had initially made the length method be protected instead of packaged private: so that subclasses could be created outside of the o.a.l.store package. I'd like to just change isFDValid() to be protected. If I do that it will be visible to subclasses and the package, but not publicly available, sound OK? protected is the wrong access flag. Protected should only be used for methods that should be overridded in subclasses, but never called from outside! I would make the isFDValid() method public to be consistent and mark it as @lucene.internal in javadocs.
          Hide
          Robert Muir added a comment -

          I don't agree with Uwe. Please dont make methods like isFDValid() public here! Protected is ok.

          Show
          Robert Muir added a comment - I don't agree with Uwe. Please dont make methods like isFDValid() public here! Protected is ok.
          Hide
          Uwe Schindler added a comment -

          Sorry I disagree. Protected is the wrong access flag, protected is only to be used for methods that are called from the class itsself and not from anywhere else. And it does not hide the method from the javadocs, so it stays "public" but with incorrect acess flag. It must be ideally package-private for tests, or @lucene.internal and public (or removed at all).

          Show
          Uwe Schindler added a comment - Sorry I disagree. Protected is the wrong access flag, protected is only to be used for methods that are called from the class itsself and not from anywhere else. And it does not hide the method from the javadocs, so it stays "public" but with incorrect acess flag. It must be ideally package-private for tests, or @lucene.internal and public (or removed at all).
          Hide
          Robert Muir added a comment -

          Sorry I disagree. Protected is the wrong access flag, protected is only to be used for methods that are called from the class itsself and not from anywhere else.

          Apparently the java language specification also disagrees with you

          Show
          Robert Muir added a comment - Sorry I disagree. Protected is the wrong access flag, protected is only to be used for methods that are called from the class itsself and not from anywhere else. Apparently the java language specification also disagrees with you
          Hide
          Robert Muir added a comment -

          Uwe here you can find a description of the access flags:
          http://docs.oracle.com/javase/tutorial/java/javaOO/accesscontrol.html

          Show
          Robert Muir added a comment - Uwe here you can find a description of the access flags: http://docs.oracle.com/javase/tutorial/java/javaOO/accesscontrol.html
          Hide
          Uwe Schindler added a comment -

          Robert I know those flags, my complaint was not that its not working or inconsistent. The problem with using protected is that its used in a "logically inconsistent way". In common Java usage, methods are made protected if they are implementations to be implemented in subclasses of a general contract that is only called from the (e.g. final) code in the superclass itsself. Protected methods should never-ever called from outside the class. Its just working from the same package, but thats a well-known design issue in Java (there are tons of blogs about that).

          The protected access flag works around the problem we have, thats true, but it makes the method "public" (in visibility) and thats my problem. So in any case it must get a @lucene.internal.

          Show
          Uwe Schindler added a comment - Robert I know those flags, my complaint was not that its not working or inconsistent. The problem with using protected is that its used in a "logically inconsistent way". In common Java usage, methods are made protected if they are implementations to be implemented in subclasses of a general contract that is only called from the (e.g. final) code in the superclass itsself. Protected methods should never-ever called from outside the class. Its just working from the same package, but thats a well-known design issue in Java (there are tons of blogs about that). The protected access flag works around the problem we have, thats true, but it makes the method "public" (in visibility) and thats my problem. So in any case it must get a @lucene.internal.
          Hide
          Robert Muir added a comment -

          Protected methods should never-ever called from outside the class. Its just working from the same package, but thats a well-known design issue in Java (there are tons of blogs about that).

          I don't agree with that (and could care less about such blogs).

          So in any case it must get a @lucene.internal.

          Yes it should. but it must not be public.

          Show
          Robert Muir added a comment - Protected methods should never-ever called from outside the class. Its just working from the same package, but thats a well-known design issue in Java (there are tons of blogs about that). I don't agree with that (and could care less about such blogs). So in any case it must get a @lucene.internal. Yes it should. but it must not be public.
          Hide
          Uwe Schindler added a comment -

          I digged around, because I dont want to make this method visible at all. I found out:
          This method is only used from TestCompoundFile and this test only uses SimpleFSDirectory. In my opinion, its completely bogus. To fix the visibility issue:

          • the isFDValid() method should be package-protected
          • it should only exist in SimpleFSDirIndexInput, not in the abstract base class
            If its in the base class every implementor has to implement it just for a stupid test that only works with SimpleFSDir. So this must be fixed. Just move it to SimpleFSDir or maybe we remove this method completely and fix the test.
          Show
          Uwe Schindler added a comment - I digged around, because I dont want to make this method visible at all. I found out: This method is only used from TestCompoundFile and this test only uses SimpleFSDirectory. In my opinion, its completely bogus. To fix the visibility issue: the isFDValid() method should be package-protected it should only exist in SimpleFSDirIndexInput, not in the abstract base class If its in the base class every implementor has to implement it just for a stupid test that only works with SimpleFSDir. So this must be fixed. Just move it to SimpleFSDir or maybe we remove this method completely and fix the test.
          Hide
          Uwe Schindler added a comment -

          Attached is a patch that fixes the visibility issue. The method is just on SimpleFSDir to make TestCompouundDirectory happy.

          The patch still does not pass the test suite because of AsyncFSDir daemon threads.

          Show
          Uwe Schindler added a comment - Attached is a patch that fixes the visibility issue. The method is just on SimpleFSDir to make TestCompouundDirectory happy. The patch still does not pass the test suite because of AsyncFSDir daemon threads.
          Hide
          Robert Muir added a comment -

          The patch still does not pass the test suite because of AsyncFSDir daemon threads.

          Maybe this new directory should be spun off into a separate issue? Do we have any idea how it performs on different operating systems? Should it be in core lucene? Can we nuke WindowsDirectory? I just want to understand the benefits, because it seems it does an async IO request but then blocks on the future to return back. so this seems no different than sync io to me. Besides, the rest of this patch is plenty for one issue since its the .store API (we need to proceed with caution).

          Show
          Robert Muir added a comment - The patch still does not pass the test suite because of AsyncFSDir daemon threads. Maybe this new directory should be spun off into a separate issue? Do we have any idea how it performs on different operating systems? Should it be in core lucene? Can we nuke WindowsDirectory? I just want to understand the benefits, because it seems it does an async IO request but then blocks on the future to return back. so this seems no different than sync io to me. Besides, the rest of this patch is plenty for one issue since its the .store API (we need to proceed with caution).
          Hide
          Dawid Weiss added a comment -

          Ok, so indeed background async io threads are a bit of a mess. The system threadpool is not marked with anything that would facilitate its filtering so a stack trace analysis is needed, which this patch does.

          I opted not to override the default thread factory because it's never known when this thread factory will be constructed (it may be initialized before LuceneTestCase is instantiated) and, more importantly, I think we shouldn't be messing with the VM we're running on unless we really don't have a choice.

          It should bounce a few jenkins machines to see if we have some leaks on other vendor VMs.

          Show
          Dawid Weiss added a comment - Ok, so indeed background async io threads are a bit of a mess. The system threadpool is not marked with anything that would facilitate its filtering so a stack trace analysis is needed, which this patch does. I opted not to override the default thread factory because it's never known when this thread factory will be constructed (it may be initialized before LuceneTestCase is instantiated) and, more importantly, I think we shouldn't be messing with the VM we're running on unless we really don't have a choice. It should bounce a few jenkins machines to see if we have some leaks on other vendor VMs.
          Hide
          Uwe Schindler added a comment -

          Robert: I agree. I will split the patch and provide one on this thread only with the refactoring done in the already-existing directories.

          The thing with windows was that the RAF-based file channel had the problem of some kind of synchronization internally in Sun's implementation. We should validate with the new and patched NIOFSDirectory if this is still the case with the completely different FileChannel (it is different class, different FileDescriptor-type,...) implementation returned by FileChannel.open(). If this new implementation fixes the old Sun Micrososystems synchronization issues in NIO v1, we can mark NIOFSDirectory working correctly in Windows with Lucene trunk / Java 7 and we are fine. So we need to benchmark this.

          I dont really like the AsyncFSDirectory, because it looks like a "workaround". If NIOFSDir is now working on windows, we don't need it.

          Show
          Uwe Schindler added a comment - Robert: I agree. I will split the patch and provide one on this thread only with the refactoring done in the already-existing directories. The thing with windows was that the RAF-based file channel had the problem of some kind of synchronization internally in Sun's implementation. We should validate with the new and patched NIOFSDirectory if this is still the case with the completely different FileChannel (it is different class, different FileDescriptor-type,...) implementation returned by FileChannel.open(). If this new implementation fixes the old Sun Micrososystems synchronization issues in NIO v1, we can mark NIOFSDirectory working correctly in Windows with Lucene trunk / Java 7 and we are fine. So we need to benchmark this. I dont really like the AsyncFSDirectory, because it looks like a "workaround". If NIOFSDir is now working on windows, we don't need it.
          Hide
          Michael Poindexter added a comment - - edited

          in my previous patch I also added the new Directory implementation into the list of FSDirectories to be used randomly while testing (when tests.directory is not given). This is missing in the latest patch. It should simply be added to the list in LuceneTestCase.java:

          I actually removed it on purpose for now. I wanted the changes to LuceneTestCase to be as uninvasive as possible, so I do some work to poke at the thread pool only if AsyncFSDirectory is selected as test.directory. Given that, I didn't want for it to be randomly selected.

          If its in the base class every implementor has to implement it just for a stupid test that only works with SimpleFSDir. So this must be fixed. Just move it to SimpleFSDir or maybe we remove this method completely and fix the test.

          Sounds good to me!

          Maybe this new directory should be spun off into a separate issue? Do we have any idea how it performs on different operating systems? Should it be in core lucene? Can we nuke WindowsDirectory? I just want to understand the benefits, because it seems it does an async IO request but then blocks on the future to return back. so this seems no different than sync io to me. Besides, the rest of this patch is plenty for one issue since its the .store API (we need to proceed with caution).

          You are right, it is no different than sync IO in the end. It's really only useful on Windows where it will use IO completion ports which means there is no need to synchronize on the file position (like java does for FileChannel, or Lucene does internally for SimpleFSDirectory). On at least Linux/BSD/Mac the Sun JDK will just do what basically FileChannel does anyway and incur the additional overhead of notifying a future, so it's unlikely to be useful there unless one is highly concerned about the Thread.interrupt thing.

          If you want to spin it off I'm fine with that, it's no problem for me to split the patch, let me know if everyone thinks that's a good idea.

          I opted not to override the default thread factory because it's never known when this thread factory will be constructed (it may be initialized before LuceneTestCase is instantiated) and, more importantly, I think we shouldn't be messing with the VM we're running on unless we really don't have a choice.

          I agree! If there is a better way I am all ears. I couldn't find any other way to mark the threads as ignorable (the Sun JRE at least doesn't name them in any useful way, and they have no property that could be looked up). We can't even figure out from looking at their stacks where they come from since they are just from a generic Executor.

          I think as gross as it is it should be fairly safe to set the default thread factory in LuceneTestCase. It should work across different vendor VM's since it is documented as part of the public Java API (see http://docs.oracle.com/javase/7/docs/api/java/nio/channels/AsynchronousChannelGroup.html). You are right that it could have been initialized before LuceneTestCase is run, but I think that's not a problem. The two cases here are:
          1.) Someone already did asynch io causing the pool to be created before LuceneTestCase. The tests should pass since the threads exist beforehand.
          2.) Nobody has done async io yet in our VM (the likely case). We'll create the threads using a special name that says to ignore them.

          Show
          Michael Poindexter added a comment - - edited in my previous patch I also added the new Directory implementation into the list of FSDirectories to be used randomly while testing (when tests.directory is not given). This is missing in the latest patch. It should simply be added to the list in LuceneTestCase.java: I actually removed it on purpose for now. I wanted the changes to LuceneTestCase to be as uninvasive as possible, so I do some work to poke at the thread pool only if AsyncFSDirectory is selected as test.directory. Given that, I didn't want for it to be randomly selected. If its in the base class every implementor has to implement it just for a stupid test that only works with SimpleFSDir. So this must be fixed. Just move it to SimpleFSDir or maybe we remove this method completely and fix the test. Sounds good to me! Maybe this new directory should be spun off into a separate issue? Do we have any idea how it performs on different operating systems? Should it be in core lucene? Can we nuke WindowsDirectory? I just want to understand the benefits, because it seems it does an async IO request but then blocks on the future to return back. so this seems no different than sync io to me. Besides, the rest of this patch is plenty for one issue since its the .store API (we need to proceed with caution). You are right, it is no different than sync IO in the end. It's really only useful on Windows where it will use IO completion ports which means there is no need to synchronize on the file position (like java does for FileChannel, or Lucene does internally for SimpleFSDirectory). On at least Linux/BSD/Mac the Sun JDK will just do what basically FileChannel does anyway and incur the additional overhead of notifying a future, so it's unlikely to be useful there unless one is highly concerned about the Thread.interrupt thing. If you want to spin it off I'm fine with that, it's no problem for me to split the patch, let me know if everyone thinks that's a good idea. I opted not to override the default thread factory because it's never known when this thread factory will be constructed (it may be initialized before LuceneTestCase is instantiated) and, more importantly, I think we shouldn't be messing with the VM we're running on unless we really don't have a choice. I agree! If there is a better way I am all ears. I couldn't find any other way to mark the threads as ignorable (the Sun JRE at least doesn't name them in any useful way, and they have no property that could be looked up). We can't even figure out from looking at their stacks where they come from since they are just from a generic Executor. I think as gross as it is it should be fairly safe to set the default thread factory in LuceneTestCase. It should work across different vendor VM's since it is documented as part of the public Java API (see http://docs.oracle.com/javase/7/docs/api/java/nio/channels/AsynchronousChannelGroup.html ). You are right that it could have been initialized before LuceneTestCase is run, but I think that's not a problem. The two cases here are: 1.) Someone already did asynch io causing the pool to be created before LuceneTestCase. The tests should pass since the threads exist beforehand. 2.) Nobody has done async io yet in our VM (the likely case). We'll create the threads using a special name that says to ignore them.
          Hide
          Uwe Schindler added a comment -

          Here is the simpliest patch, fixing the current directory implementations to use the new FileChannels. No other changes at all.

          We should benchmark the "fixed" NIOFSDirectory on windows to check, if the locking issues still exist.

          If we want to add AsyncFSDirectory again, we should do this only next to WindowsDirectory in the misc module.

          I will now test this patch and plan to commit it as a first step.

          Show
          Uwe Schindler added a comment - Here is the simpliest patch, fixing the current directory implementations to use the new FileChannels. No other changes at all. We should benchmark the "fixed" NIOFSDirectory on windows to check, if the locking issues still exist. If we want to add AsyncFSDirectory again, we should do this only next to WindowsDirectory in the misc module. I will now test this patch and plan to commit it as a first step.
          Hide
          Michael Poindexter added a comment -

          I opted not to override the default thread factory because it's never known when this thread factory will be constructed (it may be initialized before LuceneTestCase is instantiated) and, more importantly, I think we shouldn't be messing with the VM we're running on unless we really don't have a choice.

          I looked at the patch you uploaded Dawid, and I think it will only work on Windows. That IOCP class is only used on that platform. On *nix platforms there's really nothing in the stack that can distinguish these threads...hence the ThreadFactory

          Show
          Michael Poindexter added a comment - I opted not to override the default thread factory because it's never known when this thread factory will be constructed (it may be initialized before LuceneTestCase is instantiated) and, more importantly, I think we shouldn't be messing with the VM we're running on unless we really don't have a choice. I looked at the patch you uploaded Dawid, and I think it will only work on Windows. That IOCP class is only used on that platform. On *nix platforms there's really nothing in the stack that can distinguish these threads...hence the ThreadFactory
          Hide
          Dawid Weiss added a comment -

          We can't even figure out from looking at their stacks where they come from since they are just from a generic Executor.

          I think we can safely ignore those that spin inside the internal loop – see my patch. I know the sysprop is documented in the contract but if we don't have to I'd rather not substitute the defaults. This has an additional benefit that we'll see how other vendors implement these

          I'm not saying your patch was wrong or anything, I'm just opting for "ignoring" instead of "manipulating to work" strategy.

          Someone already did asynch io causing the pool to be created before LuceneTestCase. The tests should pass since the threads exist beforehand.

          This depends on which threadpool is actually used for the default. By default (from what I can see in JDK code) these threads are not eagerly allocated so it'd still result in thread leaks because there would be more threads upon leaving the test case than there were when it was started. Like I said, it's not the main reason I would like to keep it defensive and filter rather than substitute the default factory.

          Dawid

          Show
          Dawid Weiss added a comment - We can't even figure out from looking at their stacks where they come from since they are just from a generic Executor. I think we can safely ignore those that spin inside the internal loop – see my patch. I know the sysprop is documented in the contract but if we don't have to I'd rather not substitute the defaults. This has an additional benefit that we'll see how other vendors implement these I'm not saying your patch was wrong or anything, I'm just opting for "ignoring" instead of "manipulating to work" strategy. Someone already did asynch io causing the pool to be created before LuceneTestCase. The tests should pass since the threads exist beforehand. This depends on which threadpool is actually used for the default. By default (from what I can see in JDK code) these threads are not eagerly allocated so it'd still result in thread leaks because there would be more threads upon leaving the test case than there were when it was started. Like I said, it's not the main reason I would like to keep it defensive and filter rather than substitute the default factory. Dawid
          Hide
          Dawid Weiss added a comment -

          Can you post a stack trace from linux, for example? I'm on windows right now. If it's a thread pool those workers need to be idle somewhere, right?

          Show
          Dawid Weiss added a comment - Can you post a stack trace from linux, for example? I'm on windows right now. If it's a thread pool those workers need to be idle somewhere, right?
          Hide
          Uwe Schindler added a comment -

          We should move the AsyncFSDirectory issue to a separate issue. Robert and I only want to get the fixes for the already existing directories into the core. The other stuff should be a new issue. I hope that makes not too much work. I already uploaded a new patch, with only fixing the existing directories.

          Show
          Uwe Schindler added a comment - We should move the AsyncFSDirectory issue to a separate issue. Robert and I only want to get the fixes for the already existing directories into the core. The other stuff should be a new issue. I hope that makes not too much work. I already uploaded a new patch, with only fixing the existing directories.
          Hide
          Dawid Weiss added a comment -

          Oh, one more thing – I debated a bit about whether we could just make the "default" executor inside asyncdirectory not the system-default one. Then the problem of detecting these would pretty much go away because we could use whatever the hell we wanted.

          The downside is that we'd have to mimic what the std library does anyway.

          Show
          Dawid Weiss added a comment - Oh, one more thing – I debated a bit about whether we could just make the "default" executor inside asyncdirectory not the system-default one. Then the problem of detecting these would pretty much go away because we could use whatever the hell we wanted. The downside is that we'd have to mimic what the std library does anyway.
          Hide
          Michael Poindexter added a comment -

          We should benchmark the "fixed" NIOFSDirectory on windows to check, if the locking issues still exist.

          Im fairly certain they will since it's basically a Windows API issue. In the Windows API one there is no call to read at a given position without doing a seek unless you are doing overlapped IO (hence asynchronous). Since there's only one FD shared across threads with the NIOFSDirectory it will have to synchronize on it. The bug mentioned in the NIOFSDirectory javadoc actually suggests to use the AsynchronousFileChannel on Windows to work around the internal synchronization issue

          If we want to add AsyncFSDirectory again, we should do this only next to WindowsDirectory in the misc module.

          Doesn't matter to me where it lives.

          Show
          Michael Poindexter added a comment - We should benchmark the "fixed" NIOFSDirectory on windows to check, if the locking issues still exist. Im fairly certain they will since it's basically a Windows API issue. In the Windows API one there is no call to read at a given position without doing a seek unless you are doing overlapped IO (hence asynchronous). Since there's only one FD shared across threads with the NIOFSDirectory it will have to synchronize on it. The bug mentioned in the NIOFSDirectory javadoc actually suggests to use the AsynchronousFileChannel on Windows to work around the internal synchronization issue If we want to add AsyncFSDirectory again, we should do this only next to WindowsDirectory in the misc module. Doesn't matter to me where it lives.
          Hide
          Michael Poindexter added a comment -

          Can you post a stack trace from linux, for example? I'm on windows right now. If it's a thread pool those workers need to be idle somewhere, right?

          Here's a stack from a Mac. There's really nothing to distinguish it from any other thread pool.

          "LUCENETESTIGNORED" daemon prio=5 tid=0x00007f95bccfd000 nid=0x5503 waiting on condition [0x0000000113a10000]
          java.lang.Thread.State: TIMED_WAITING (parking)
          at sun.misc.Unsafe.park(Native Method)

          • parking to wait for <0x00000007e8da1468> (a java.util.concurrent.SynchronousQueue$TransferStack)
            at java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:226)
            at java.util.concurrent.SynchronousQueue$TransferStack.awaitFulfill(SynchronousQueue.java:460)
            at java.util.concurrent.SynchronousQueue$TransferStack.transfer(SynchronousQueue.java:359)
            at java.util.concurrent.SynchronousQueue.poll(SynchronousQueue.java:942)
            at java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:1043)
            at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1103)
            at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
            at java.lang.Thread.run(Thread.java:722)

          Oh, one more thing – I debated a bit about whether we could just make the "default" executor inside asyncdirectory not the system-default one. Then the problem of detecting these would pretty much go away because we could use whatever the hell we wanted.

          Yes, this was actually my first approach. At least on Mac the JRE creates some threads for it's own use in spite of this.

          Show
          Michael Poindexter added a comment - Can you post a stack trace from linux, for example? I'm on windows right now. If it's a thread pool those workers need to be idle somewhere, right? Here's a stack from a Mac. There's really nothing to distinguish it from any other thread pool. "LUCENETESTIGNORED" daemon prio=5 tid=0x00007f95bccfd000 nid=0x5503 waiting on condition [0x0000000113a10000] java.lang.Thread.State: TIMED_WAITING (parking) at sun.misc.Unsafe.park(Native Method) parking to wait for <0x00000007e8da1468> (a java.util.concurrent.SynchronousQueue$TransferStack) at java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:226) at java.util.concurrent.SynchronousQueue$TransferStack.awaitFulfill(SynchronousQueue.java:460) at java.util.concurrent.SynchronousQueue$TransferStack.transfer(SynchronousQueue.java:359) at java.util.concurrent.SynchronousQueue.poll(SynchronousQueue.java:942) at java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:1043) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1103) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603) at java.lang.Thread.run(Thread.java:722) Oh, one more thing – I debated a bit about whether we could just make the "default" executor inside asyncdirectory not the system-default one. Then the problem of detecting these would pretty much go away because we could use whatever the hell we wanted. Yes, this was actually my first approach. At least on Mac the JRE creates some threads for it's own use in spite of this.
          Hide
          Uwe Schindler added a comment -

          Have you looked at WindowsDirectory.cpp in the "misc" module? It does overlapped IO but still in sync.

          Show
          Uwe Schindler added a comment - Have you looked at WindowsDirectory.cpp in the "misc" module? It does overlapped IO but still in sync.
          Hide
          Dawid Weiss added a comment -

          Yep, ok I get you. They are all daemon threads, so we could just ignore these. But there were a good few cases when daemon threads were forked that led to discovering odd causes so I'd rather keep those.

          Seems like there's no other way indeed. I still have a suggestion of keeping the static initializer but moving the common code to that "quick patch" class to keep it in a single location. You can also detect those threads by returning a thread class that implements some internal interface, doesn't have to be by-name (and I'd actually name them legibly: "asyncio-%d" or something).

          Show
          Dawid Weiss added a comment - Yep, ok I get you. They are all daemon threads, so we could just ignore these. But there were a good few cases when daemon threads were forked that led to discovering odd causes so I'd rather keep those. Seems like there's no other way indeed. I still have a suggestion of keeping the static initializer but moving the common code to that "quick patch" class to keep it in a single location. You can also detect those threads by returning a thread class that implements some internal interface, doesn't have to be by-name (and I'd actually name them legibly: "asyncio-%d" or something).
          Hide
          Michael Poindexter added a comment -

          Have you looked at WindowsDirectory.cpp in the "misc" module? It does overlapped IO but still in sync.

          Yep, I looked at it. I don't know about anyone else, but I strongly prefer not to use native code if I can help it (makes deployment more of a pain since you need the right binary for your architecture, possibility of native resource leaks that are hard to debug). If there exists a Java API I'd much rather use that.

          Show
          Michael Poindexter added a comment - Have you looked at WindowsDirectory.cpp in the "misc" module? It does overlapped IO but still in sync. Yep, I looked at it. I don't know about anyone else, but I strongly prefer not to use native code if I can help it (makes deployment more of a pain since you need the right binary for your architecture, possibility of native resource leaks that are hard to debug). If there exists a Java API I'd much rather use that.
          Hide
          Michael Poindexter added a comment -

          Seems like there's no other way indeed. I still have a suggestion of keeping the static initializer but moving the common code to that "quick patch" class to keep it in a single location. You can also detect those threads by returning a thread class that implements some internal interface, doesn't have to be by-name (and I'd actually name them legibly: "asyncio-%d" or something).

          OK, will do it this way. I thought it was more clear to name it separately since the "quick patch" thing implied that it was some sort of temporary fix. Since Uwe and Robert want to split this issue I'll do it in a patch on that issue when it gets created. Thanks for looking at this!

          Show
          Michael Poindexter added a comment - Seems like there's no other way indeed. I still have a suggestion of keeping the static initializer but moving the common code to that "quick patch" class to keep it in a single location. You can also detect those threads by returning a thread class that implements some internal interface, doesn't have to be by-name (and I'd actually name them legibly: "asyncio-%d" or something). OK, will do it this way. I thought it was more clear to name it separately since the "quick patch" thing implied that it was some sort of temporary fix. Since Uwe and Robert want to split this issue I'll do it in a patch on that issue when it gets created. Thanks for looking at this!
          Hide
          Robert Muir added a comment -

          Yep, I looked at it. I don't know about anyone else, but I strongly prefer not to use native code if I can help it (makes deployment more of a pain since you need the right binary for your architecture, possibility of native resource leaks that are hard to debug). If there exists a Java API I'd much rather use that.

          Exactly: thats why if this AsyncDirectory that waits doesn't have too much overhead, it would be a nice replacement for that jni crap.

          But we can probably get all of this in lucene faster if we break the issue up a little bit

          Show
          Robert Muir added a comment - Yep, I looked at it. I don't know about anyone else, but I strongly prefer not to use native code if I can help it (makes deployment more of a pain since you need the right binary for your architecture, possibility of native resource leaks that are hard to debug). If there exists a Java API I'd much rather use that. Exactly: thats why if this AsyncDirectory that waits doesn't have too much overhead, it would be a nice replacement for that jni crap. But we can probably get all of this in lucene faster if we break the issue up a little bit
          Hide
          Uwe Schindler added a comment -

          As Robert said: I uploaded the minimal patch with everything except AsyncFSDir. We should just check on this issue, it it is ok and create a new issue named "AsyncFSDir" that only handles the daemon threads and discusses the directory implementation.

          Show
          Uwe Schindler added a comment - As Robert said: I uploaded the minimal patch with everything except AsyncFSDir. We should just check on this issue, it it is ok and create a new issue named "AsyncFSDir" that only handles the daemon threads and discusses the directory implementation.
          Hide
          Robert Muir added a comment -

          After reviewing the patch, my main concern is the generics on FSIndexInput.

          This thing really buys us very little... and adding a generic parameter makes it not worth the trouble IMO.
          So maybe we should just nuke this thing completely and let NIO and SimpleFS just be their own thing.

          (I caused the current situation, but i at least argue its an improvement over NIOFS extends SimpleFS, which is how it was before).

          Show
          Robert Muir added a comment - After reviewing the patch, my main concern is the generics on FSIndexInput. This thing really buys us very little... and adding a generic parameter makes it not worth the trouble IMO. So maybe we should just nuke this thing completely and let NIO and SimpleFS just be their own thing. (I caused the current situation, but i at least argue its an improvement over NIOFS extends SimpleFS, which is how it was before).
          Hide
          Uwe Schindler added a comment -

          So maybe we should just nuke this thing completely and let NIO and SimpleFS just be their own thing.

          I had the same idea, let's do this! Its easy to do. The abstract FSIndexInput just adds some close and filesize stuff that could leave in each class. The generics also have a backside: a CAST on every access. I know newer JVMs are optimizing that away, but its bad ptractice on inner loops (like synthetic accessor methods caused by private inner classes).

          Show
          Uwe Schindler added a comment - So maybe we should just nuke this thing completely and let NIO and SimpleFS just be their own thing. I had the same idea, let's do this! Its easy to do. The abstract FSIndexInput just adds some close and filesize stuff that could leave in each class. The generics also have a backside: a CAST on every access. I know newer JVMs are optimizing that away, but its bad ptractice on inner loops (like synthetic accessor methods caused by private inner classes).
          Hide
          Michael Poindexter added a comment -

          New version of patch:
          -Doesn't include AsyncFSDirectory
          -Removes FSIndexInput and instead NIOFSDirectory and SimpleFSDirectory both have direct subclasses of BufferedIndexInput
          -Doesn't use a generic for the FSIndexOutput. I still use a subclass to supply a concrete implementation.
          -Doesn't change anything in the test framework

          Show
          Michael Poindexter added a comment - New version of patch: -Doesn't include AsyncFSDirectory -Removes FSIndexInput and instead NIOFSDirectory and SimpleFSDirectory both have direct subclasses of BufferedIndexInput -Doesn't use a generic for the FSIndexOutput. I still use a subclass to supply a concrete implementation. -Doesn't change anything in the test framework
          Hide
          Uwe Schindler added a comment - - edited

          Your patch again introduces code I already removed (the channel based indexoutput was removed, too, also the useess abstract base class)! Have you looked at my patch and used it as basis? If we want a channel based-IndexOutput then it has nothing to do with the base class, should be in e.g. NIOFSIndexOutput. This is also a separate issue.

          I will now upload a patch again! When you make further changes, please always use the latest oatch with a clean checkout!

          Show
          Uwe Schindler added a comment - - edited Your patch again introduces code I already removed (the channel based indexoutput was removed, too, also the useess abstract base class)! Have you looked at my patch and used it as basis? If we want a channel based-IndexOutput then it has nothing to do with the base class, should be in e.g. NIOFSIndexOutput. This is also a separate issue. I will now upload a patch again! When you make further changes, please always use the latest oatch with a clean checkout!
          Hide
          Uwe Schindler added a comment -

          Patch restoring the state and splitting the IndexInputs of SimpleFS and NIOFS.

          Show
          Uwe Schindler added a comment - Patch restoring the state and splitting the IndexInputs of SimpleFS and NIOFS.
          Hide
          Michael Poindexter added a comment -

          I did use your patch as a basis. The latest version of my patch removes the channel based indexoutput, but I kept the base class split (but changed it to not use generics). As the code is (without a base class for FSIndexOutput) there is no possibility to use any alternate implementation of FSIndexOutput in a subclass that does not use a RandomAccessFile. The AsyncFSDirectory was using an alternate output so I wanted to preserve its ability to do that. Sorry if I should not have done that.

          Show
          Michael Poindexter added a comment - I did use your patch as a basis. The latest version of my patch removes the channel based indexoutput, but I kept the base class split (but changed it to not use generics). As the code is (without a base class for FSIndexOutput) there is no possibility to use any alternate implementation of FSIndexOutput in a subclass that does not use a RandomAccessFile. The AsyncFSDirectory was using an alternate output so I wanted to preserve its ability to do that. Sorry if I should not have done that.
          Hide
          Uwe Schindler added a comment -

          AsyncFSDirectory does not need an async output. In Lucene we are fine to use RandomAccessFile to write as this is never done from several thraeds. Writing an index is always a sequential operation, theoretically an OutputStream would be enough (but we still have to seek sometimes, so this is not completely true). So a channel based FSIndexOutput is not needed, we can stick with the one that already exists. And because of this I removed the split.

          I opened a new issue for AsyncFS: LUCENE-4864

          Show
          Uwe Schindler added a comment - AsyncFSDirectory does not need an async output. In Lucene we are fine to use RandomAccessFile to write as this is never done from several thraeds. Writing an index is always a sequential operation, theoretically an OutputStream would be enough (but we still have to seek sometimes, so this is not completely true). So a channel based FSIndexOutput is not needed, we can stick with the one that already exists. And because of this I removed the split. I opened a new issue for AsyncFS: LUCENE-4864
          Hide
          Uwe Schindler added a comment -

          As the code is (without a base class for FSIndexOutput) there is no possibility to use any alternate implementation of FSIndexOutput in a subclass that does not use a RandomAccessFile

          If you want one (but you don't, see above), just extend BufferedIndexOutput. This is the same problem like a not really needed base class for NIO/Simple.

          Show
          Uwe Schindler added a comment - As the code is (without a base class for FSIndexOutput) there is no possibility to use any alternate implementation of FSIndexOutput in a subclass that does not use a RandomAccessFile If you want one (but you don't, see above), just extend BufferedIndexOutput. This is the same problem like a not really needed base class for NIO/Simple.
          Hide
          Michael Poindexter added a comment -

          AsyncFSDirectory does not need an async output. In Lucene we are fine to use RandomAccessFile to write as this is never done from several thraeds. Writing an index is always a sequential operation, theoretically an OutputStream would be enough (but we still have to seek sometimes, so this is not completely true). So a channel based FSIndexOutput is not needed, we can stick with the one that already exists. And because of this I removed the split.

          Fair enough. I was trying to preserve the ability for AsyncFSOutput to write to a Path in the future (and for Path based access there is not necessarily a File on disk to open a RAF from...it could be a zip filesystem for example). Unfortunately you can't just extend BufferedIndexOutput if you wanted to do this. There's a hook in the close() method of FSIndexOutput that calls parent.onIndexOutputClosed(this). onIndexOutputClosed only accepts a FSIndexOutput, not a BufferedIndexOutput.

          It really doesn't matter I guess, but I think to allow for extenders the subclassable FSIndexOutut would be nice.

          Show
          Michael Poindexter added a comment - AsyncFSDirectory does not need an async output. In Lucene we are fine to use RandomAccessFile to write as this is never done from several thraeds. Writing an index is always a sequential operation, theoretically an OutputStream would be enough (but we still have to seek sometimes, so this is not completely true). So a channel based FSIndexOutput is not needed, we can stick with the one that already exists. And because of this I removed the split. Fair enough. I was trying to preserve the ability for AsyncFSOutput to write to a Path in the future (and for Path based access there is not necessarily a File on disk to open a RAF from...it could be a zip filesystem for example). Unfortunately you can't just extend BufferedIndexOutput if you wanted to do this. There's a hook in the close() method of FSIndexOutput that calls parent.onIndexOutputClosed(this). onIndexOutputClosed only accepts a FSIndexOutput, not a BufferedIndexOutput. It really doesn't matter I guess, but I think to allow for extenders the subclassable FSIndexOutut would be nice.
          Hide
          Uwe Schindler added a comment -

          There's a hook in the close() method of FSIndexOutput that calls parent.onIndexOutputClosed(this). onIndexOutputClosed only accepts a FSIndexOutput, not a BufferedIndexOutput.

          This is not optimal, I would like to change this. I have no idea for which reason this was added. FSDirectory should only expect a IndexOutput subclass and not add magic here. We can fix this in the future, but thats also a separate issue. So I would like to keep FSIndexOutput unchanged. I think the callback with onIndexOutputClosed is also for testing purposes, I will review the backgrounds.

          I would like to commt the latest patch I upped and then upload a new patch with your AsyncFS (again, only the IndexInput part) to the new issue.

          Show
          Uwe Schindler added a comment - There's a hook in the close() method of FSIndexOutput that calls parent.onIndexOutputClosed(this). onIndexOutputClosed only accepts a FSIndexOutput, not a BufferedIndexOutput. This is not optimal, I would like to change this. I have no idea for which reason this was added. FSDirectory should only expect a IndexOutput subclass and not add magic here. We can fix this in the future, but thats also a separate issue. So I would like to keep FSIndexOutput unchanged. I think the callback with onIndexOutputClosed is also for testing purposes, I will review the backgrounds. I would like to commt the latest patch I upped and then upload a new patch with your AsyncFS (again, only the IndexInput part) to the new issue.
          Hide
          Uwe Schindler added a comment -

          The attached patch fixes the problem with stale files. The protected method should not take FSIndexOutput (this especially adds a synthetic accessor method). Instead onClosedIndexOutput simply gets the file name. I also added documentation.

          You can now add your own BufferedIndexOutput subclass (e.g. in NIOFSDir based on channels) and just have to take care on calling the protected method (which I dont really like..., it may cause bugs if you dont take care of that - I would like to have some automatism).

          I think the new patch is ready.

          Show
          Uwe Schindler added a comment - The attached patch fixes the problem with stale files. The protected method should not take FSIndexOutput (this especially adds a synthetic accessor method). Instead onClosedIndexOutput simply gets the file name. I also added documentation. You can now add your own BufferedIndexOutput subclass (e.g. in NIOFSDir based on channels) and just have to take care on calling the protected method (which I dont really like..., it may cause bugs if you dont take care of that - I would like to have some automatism). I think the new patch is ready.
          Hide
          Uwe Schindler added a comment -

          The previous patch had wrong isClone in NIOFS.

          Show
          Uwe Schindler added a comment - The previous patch had wrong isClone in NIOFS.
          Hide
          Uwe Schindler added a comment -

          I finally did some tests before committing:

          • It is possible with NIOFSDirectory to open an IndexReader and after that remove all files in Windows. The IndexReader is still working as it should. This mimics POSIX behaviour, so Lucene can get rid of files earlier when NIOFS is used.
          • With MMapDirectory, the file channel also has delete allowed, but the windows documentation explicitly says: http://msdn.microsoft.com/en-us/library/aa363915%28v=VS.85%29.aspx that mmapped files cannot be deleted (the documentation is a bit unclear, but the "OR" states both possibilities):

          The DeleteFile function fails if an application attempts to delete a file that has other handles open for normal I/O or as a memory-mapped file (FILE_SHARE_DELETE must have been specified when other handles were opened).
          The DeleteFile function marks a file for deletion on close. Therefore, the file deletion does not occur until the last handle to the file is closed. Subsequent calls to CreateFile to open the file fail with ERROR_ACCESS_DENIED.

          I will commit this now.

          Show
          Uwe Schindler added a comment - I finally did some tests before committing: It is possible with NIOFSDirectory to open an IndexReader and after that remove all files in Windows. The IndexReader is still working as it should. This mimics POSIX behaviour, so Lucene can get rid of files earlier when NIOFS is used. With MMapDirectory, the file channel also has delete allowed, but the windows documentation explicitly says: http://msdn.microsoft.com/en-us/library/aa363915%28v=VS.85%29.aspx that mmapped files cannot be deleted (the documentation is a bit unclear, but the "OR" states both possibilities): The DeleteFile function fails if an application attempts to delete a file that has other handles open for normal I/O or as a memory-mapped file (FILE_SHARE_DELETE must have been specified when other handles were opened). The DeleteFile function marks a file for deletion on close. Therefore, the file deletion does not occur until the last handle to the file is closed. Subsequent calls to CreateFile to open the file fail with ERROR_ACCESS_DENIED. I will commit this now.
          Hide
          Uwe Schindler added a comment -

          Thanks Michael! Let's move on to AsyncFSDirectory

          Show
          Uwe Schindler added a comment - Thanks Michael! Let's move on to AsyncFSDirectory
          Hide
          Commit Tag Bot added a comment -

          [trunk commit] Uwe Schindler
          http://svn.apache.org/viewvc?view=revision&revision=1459437

          LUCENE-4848: Use Java 7 NIO2-FileChannel instead of RandomAccessFile for NIOFSDirectory and MMapDirectory

          Show
          Commit Tag Bot added a comment - [trunk commit] Uwe Schindler http://svn.apache.org/viewvc?view=revision&revision=1459437 LUCENE-4848 : Use Java 7 NIO2-FileChannel instead of RandomAccessFile for NIOFSDirectory and MMapDirectory

            People

            • Assignee:
              Uwe Schindler
              Reporter:
              Michael Poindexter
            • Votes:
              1 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development