# WindowsDirectory

## Details

• Type: New Feature
• Status: Closed
• Priority: Major
• Resolution: Fixed
• Affects Version/s: None
• Fix Version/s:
• Component/s:
• Labels:
None
• Lucene Fields:
New, Patch Available

## Description

We can use Windows' overlapped IO to do pread() and avoid the performance problems of SimpleFS/NIOFSDir.

## Attachments

1. LUCENE-2791.patch
15 kB
Robert Muir
2. LUCENE-2791.patch
15 kB
Robert Muir
3. LUCENE-2791.patch
10 kB
Robert Muir
4. WindowsDirectory_amd64.dll
432 kB
Robert Muir
5. WindowsDirectory.dll
153 kB
Robert Muir

## Activity

Hide
Robert Muir added a comment -

here's an initial patch:
All tests pass using this directory with -Dtests.directory, on 32 and 64bit.

I'll attach pre-built DLL files and instructions in case someone else wants to test.

Show
Robert Muir added a comment - here's an initial patch: All tests pass using this directory with -Dtests.directory, on 32 and 64bit. I'll attach pre-built DLL files and instructions in case someone else wants to test.
Hide
Robert Muir added a comment -

heres a dll built for 32-bit windows.
to use it, just put it in a folder thats in your windows PATH, then the directory will work.

to then run all lucene tests with it, first compile contrib/misc, then run a command like this:

ant -lib \path\to\lusolr\lucene\build\contrib\misc\lucene-misc-4.0-SNAPSHOT.jar test-core -Dtests.directory=WindowsDirectory

Show
Robert Muir added a comment - heres a dll built for 32-bit windows. to use it, just put it in a folder thats in your windows PATH, then the directory will work. to then run all lucene tests with it, first compile contrib/misc, then run a command like this: ant -lib \path\to\lusolr\lucene\build\contrib\misc\lucene-misc-4.0-SNAPSHOT.jar test-core -Dtests.directory=WindowsDirectory
Hide
Robert Muir added a comment -

here's one i compiled for 64-bit. just name it WindowsDirectory.dll and follow the other instructions (obviously, you only use this if you use a 64-bit jvm).

the compilation instructions are in the javadocs for the WindowsDirectory.java by the way, its pretty easy to install mingw or ming64 to do it.

Show
Robert Muir added a comment - here's one i compiled for 64-bit. just name it WindowsDirectory.dll and follow the other instructions (obviously, you only use this if you use a 64-bit jvm). the compilation instructions are in the javadocs for the WindowsDirectory.java by the way, its pretty easy to install mingw or ming64 to do it.
Hide
Robert Muir added a comment -

attached is an updated patch (fixing some indentation, adding some paranoid NPE checks in the JNI code, various stuff like that)

I also included a crude benchmark made by mikemccand (I think we should commit this to our tests!)

It builds a 100k doc index and searches with 20 threads for 10 seconds:
I used -Dtests.seed=0:0 and -Dtests.codec=Standard, and recorded best out of 3.
I tested win32 -client, and win64 -server, only because these are the oracle defaults.

Directory QPS Win32 -client
NIOFSDirectory 360
SimpleFS 372
WindowsDirectory 616
RAMDirectory 760
MMapDirectory 772
Directory QPS Win64 -server
NIOFSDirectory 361
SimpleFS 376
WindowsDirectory 777
RAMDirectory 1105
MMapDirectory 1138

So, I think this gets you past the sync issue and can be a good choice e.g. for win32
But MMap still seems to be the best for this benchmark.

Show
Robert Muir added a comment - attached is an updated patch (fixing some indentation, adding some paranoid NPE checks in the JNI code, various stuff like that) I also included a crude benchmark made by mikemccand (I think we should commit this to our tests!) It builds a 100k doc index and searches with 20 threads for 10 seconds: I used -Dtests.seed=0:0 and -Dtests.codec=Standard, and recorded best out of 3. I tested win32 -client, and win64 -server, only because these are the oracle defaults. Directory QPS Win32 -client NIOFSDirectory 360 SimpleFS 372 WindowsDirectory 616 RAMDirectory 760 MMapDirectory 772 Directory QPS Win64 -server NIOFSDirectory 361 SimpleFS 376 WindowsDirectory 777 RAMDirectory 1105 MMapDirectory 1138 So, I think this gets you past the sync issue and can be a good choice e.g. for win32 But MMap still seems to be the best for this benchmark.
Hide
Michael McCandless added a comment -

This is awesome! It gives us a workaround to the JRE bug (http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=6265734) killing NIOFSDir's concurrency on Windows.

Does is also allow for deletion of still-open files?

You may want to experiment w/ different buffer sizes to see if that impacts performance.

Also: this allows you to do the equivalent of Linux's O_DIRECT, right? If so... I think we should augment the Dir API so that on opening an input it receives some sort of IOContext or something expressing why we are opening. So for merging we would want "direct iO" (bypass OS buffer cache or NOREUSE fadvise flag, etc.), and larger buffer sizes and perhaps other low level IO settings in the future.

Show
Michael McCandless added a comment - This is awesome! It gives us a workaround to the JRE bug ( http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=6265734 ) killing NIOFSDir's concurrency on Windows. Does is also allow for deletion of still-open files? You may want to experiment w/ different buffer sizes to see if that impacts performance. Also: this allows you to do the equivalent of Linux's O_DIRECT, right? If so... I think we should augment the Dir API so that on opening an input it receives some sort of IOContext or something expressing why we are opening. So for merging we would want "direct iO" (bypass OS buffer cache or NOREUSE fadvise flag, etc.), and larger buffer sizes and perhaps other low level IO settings in the future.
Hide
Native Policeman added a comment - - edited

There are some violations:

We can use Windows' overlapped IO to do pread() and avoid the performance problems of SimpleFS/NIOFSDir.

Ahm you dont really use overlapped/async IO, ReadFile in your code only returns when read operation is finished, so not async. Because when you do you have to pass a flag: FILE_FLAG_OVERLAPPED. But I don't think we should really make it async for now, that is too much stuff as Lucene does not really support it at the moment.

• Currently you don't really need the OVERLAPPED structure at all, just pass NULL: http://msdn.microsoft.com/en-us/library/aa365467(VS.85).aspx, or is there a reason?
• Do we need FILE_SHARE_WRITE?
• You use env->GetStringUTFChars, which is not really a problem as index filenames are always ASCII only. The default methods in the Win32 API use the system default charset (ANSI) for the arguments to OpenFile, which links to "OpenFileA" in kernel32.dll. I would suggest to use the real UTF16 chars (env->GetStringChars) and use OpenFileW method, which takes wchar instead of char. The same applies to Linux Dir, but there I am not sure how encodings are handled.
• Also the Format functions for GetLastError() results use ANSI per default. We should use the *W method, too, and pass it to java as wchar.
• Exception handling: in native read, you call throwIOException, when ReadFile() returns false. But 2 lines later you still call the method to copy the local buffer to the java byte array. JNI docs say, that after an Exception is in the thread's local exception, you should not call any JNI methods anymore (expect some allowed ones). Also the copy operation is not really needed.
• As in contract to the native Linux directory we are using our own Handles and not Java's FileDescriptor class to store the handle. If something goes wrong, GC can never clean up the file and it stays open. So you should add a finalizer, Java's internal File methods do that also (RandomAccessFile has a finalizer that calls close on operating system file).
Show
Hide
Uwe Schindler added a comment -

Does is also allow for deletion of still-open files?

No, of course not. Thats native windos, files that are open cannot be closed. No difference how the file was opened. Also MMapped Files cannot be deleted. Windows does not support the semantics of unix in Win32 API (but it does in POSIX! LOL), so open file: no deleting or changing

Show
Uwe Schindler added a comment - Does is also allow for deletion of still-open files? No, of course not. Thats native windos, files that are open cannot be closed. No difference how the file was opened. Also MMapped Files cannot be deleted. Windows does not support the semantics of unix in Win32 API (but it does in POSIX! LOL), so open file: no deleting or changing
Hide
Uwe Schindler added a comment -

Also: this allows you to do the equivalent of Linux's O_DIRECT, right? If so... I think we should augment the Dir API so that on opening an input it receives some sort of IOContext or something expressing why we are opening. So for merging we would want "direct iO" (bypass OS buffer cache or NOREUSE fadvise flag, etc.), and larger buffer sizes and perhaps other low level IO settings in the future

Thats different to O_DIRECT. FS caches are still used by windows, but there are also options to disable this: FILE_FLAG_WRITE_THROUGH http://msdn.microsoft.com/en-us/library/aa363858(v=VS.85).aspx

Show
Uwe Schindler added a comment - Also: this allows you to do the equivalent of Linux's O_DIRECT, right? If so... I think we should augment the Dir API so that on opening an input it receives some sort of IOContext or something expressing why we are opening. So for merging we would want "direct iO" (bypass OS buffer cache or NOREUSE fadvise flag, etc.), and larger buffer sizes and perhaps other low level IO settings in the future Thats different to O_DIRECT. FS caches are still used by windows, but there are also options to disable this: FILE_FLAG_WRITE_THROUGH http://msdn.microsoft.com/en-us/library/aa363858(v=VS.85).aspx
Hide
Native Policeman added a comment -

To my above comment and OVERLAPPED in my opinion not realy used (as not async):
Reading http://msdn.microsoft.com/en-us/library/aa365467(VS.85).aspx#synchronization_and_file_position it seems to behave little different when OVERLAPPED is non-null for synchronous reads (as we do currently). But as you never read the contents og OVERLAPPED after the function call, why pass it in? Please explain

Show
Native Policeman added a comment - To my above comment and OVERLAPPED in my opinion not realy used (as not async): Reading http://msdn.microsoft.com/en-us/library/aa365467(VS.85).aspx#synchronization_and_file_position it seems to behave little different when OVERLAPPED is non-null for synchronous reads (as we do currently). But as you never read the contents og OVERLAPPED after the function call, why pass it in? Please explain
Hide
Robert Muir added a comment -

Ahm you dont really use overlapped/async IO, ReadFile in your code only returns when read operation is finished, so not async. Because when you do you have to pass a flag: FILE_FLAG_OVERLAPPED. But I don't think we should really make it async for now, that is too much stuff as Lucene does not really support it at the moment.

Yes I do, Overlapped IO is distinct from asynchronous IO! you just use the overlapped structure without OVERLAPPED, as i did here.

Currently you don't really need the OVERLAPPED structure at all, just pass NULL: http://msdn.microsoft.com/en-us/library/aa365467(VS.85).aspx, or is there a reason?

Yes I do use it, do you understand C ?

Do we need FILE_SHARE_WRITE?

Yes

You use env->GetStringUTFChars, which is not really a problem as index filenames are always ASCII only. The default methods in the Win32 API use the system default charset (ANSI) for the arguments to OpenFile, which links to "OpenFileA" in kernel32.dll. I would suggest to use the real UTF16 chars (env->GetStringChars) and use OpenFileW method, which takes wchar instead of char. The same applies to Linux Dir, but there I am not sure how encodings are handled.

We don't open unicode files in lucene, they are all ascii.

Also the Format functions for GetLastError() results use ANSI per default. We should use the *W method, too, and pass it to java as wchar.

Exception handling: in native read, you call throwIOException, when ReadFile() returns false. But 2 lines later you still call the method to copy the local buffer to the java byte array. JNI docs say, that after an Exception is in the thread's local exception, you should not call any JNI methods anymore (expect some allowed ones). Also the copy operation is not really needed.

I don't think you know what you are talking about!

The code uses 2 techniques, alloc on the stack+SetByteArrayRegion (for small reads), and GetByteArrayElements (for larger reads). The JDK (at least sun, etc) always makes a copy for the latter.

So bottom line: I disagree with you on everything you said.

Show
Hide
Robert Muir added a comment -

To my above comment and OVERLAPPED in my opinion not realy used (as not async):
Reading http://msdn.microsoft.com/en-us/library/aa365467(VS.85).aspx#synchronization_and_file_position it seems to behave little different when OVERLAPPED is non-null for synchronous reads (as we do currently). But as you never read the contents og OVERLAPPED after the function call, why pass it in? Please explain

To position the read versus seeking the file. This is explained in the documentation on that very same page!:

Considerations for working with synchronous file handles:

If lpOverlapped is not NULL, the read operation starts at the offset that is specified in the OVERLAPPED structure and ReadFile does not return until the read operation is complete. The system updates the OVERLAPPED offset before ReadFile returns.

Show
Hide
Native Policeman added a comment -

So bottom line: I disagree with you on everything you said.

This was not against you, but I am hacking native Windows since Win32 was started

Yes I do use it, do you understand C ?

I kill you the next time we meet. I am hacking C longer than Java. This was not about C it was about the MSDN docs telling something different. See my last comment above. Thats all. Why are you so aggresive?

Exception handling: in native read, you call throwIOException, when ReadFile() returns false. But 2 lines later you still call the method to copy the local buffer to the java byte array. JNI docs say, that after an Exception is in the thread's local exception, you should not call any JNI methods anymore (expect some allowed ones). Also the copy operation is not really needed.

I don't think you know what you are talking about!

Sorry we seem to have a misunderstanding. I was not talking about the internal copy the JVM does. I was talking about the fact, that you should not call JNI functions anymore when you set the thread's exception status and exit the function as soon as possible. You set this status after the failed ReadFile with throwIOEx(). After that its no longer needed to copy the buffer[] to the java byte array and its risky to do it because I am not sure if this env->SetByteArrayRegion() is safe to be called after exception status is set - that all I wanted to say. I would change tha code to:

if (ReadFile((HANDLE) fd, &buffer, length, &numRead, &io)) {
env->SetByteArrayRegion(bytes, offset, numRead, (const jbyte *) buffer);
} else {
throwIOException(env, GetLastError());
}


Please note env->throwNew() always returns, it simply sets the thread's exception status, so the JVM thorws the Java Exceüption passed in after the native method returns to Java bytecode again.

Thanks,
Native Policeman

Show
Hide
Robert Muir added a comment -

This was not against you, but I am hacking native Windows since Win32 was started

But you don't understand how its IO works?

Show
Robert Muir added a comment - This was not against you, but I am hacking native Windows since Win32 was started But you don't understand how its IO works?
Hide
Native Policeman added a comment -

If lpOverlapped is not NULL, the read operation starts at the offset that is specified in the OVERLAPPED structure and ReadFile does not return until the read operation is complete. The system updates the OVERLAPPED offset before ReadFile returns.

Thanks, I missed that. Sorry - the idea is simply to make positioned reads, I understand that.

Show
Native Policeman added a comment - If lpOverlapped is not NULL, the read operation starts at the offset that is specified in the OVERLAPPED structure and ReadFile does not return until the read operation is complete. The system updates the OVERLAPPED offset before ReadFile returns. Thanks, I missed that. Sorry - the idea is simply to make positioned reads, I understand that.
Hide
Native Policeman added a comment -

But you don't understand how its IO works?

You will laugh, but I did exactly the same thing a few weeks ago Sorry, the whole thing was a misunderstanding.

The other things are simply minor improvements. And Mike also said that we may create files by codecs with names that are non-ascii. And as you are the unicode policeman, I dont unterstand how you can live with default charsets... This is why I suggested to use *W functions in windows and use the UTF16 chars for that reason.

The exception status stuff is also just an optimization in the error case? So why are you aggressive?

Show
Native Policeman added a comment - But you don't understand how its IO works? You will laugh, but I did exactly the same thing a few weeks ago Sorry, the whole thing was a misunderstanding. The other things are simply minor improvements. And Mike also said that we may create files by codecs with names that are non-ascii. And as you are the unicode policeman, I dont unterstand how you can live with default charsets... This is why I suggested to use *W functions in windows and use the UTF16 chars for that reason. The exception status stuff is also just an optimization in the error case? So why are you aggressive?
Hide
Robert Muir added a comment -

And Mike also said that we may create files by codecs with names that are non-ascii. And as you are the unicode policeman, I dont unterstand how you can live with default charsets...

its my understanding that all indexfiles/codecs are ascii-only.

If this is going to change, then there is a lot of policework to do. Personally i would really prefer if we simply keep codecs and lucene filenames as ascii-only!

For non-ascii filenames, java.io.File is broken, its equals() is inconsistent with its hashCode(), even on windows, and definitely on things like macos (as i think it still uses unicode normalization to normalize filenames). We should seriously avoid the system-dependent problems that will arise by using non-ascii filenames in these parts of lucene: i don't see this bringing a lot of benefits either.

Show
Robert Muir added a comment - And Mike also said that we may create files by codecs with names that are non-ascii. And as you are the unicode policeman, I dont unterstand how you can live with default charsets... its my understanding that all indexfiles/codecs are ascii-only. If this is going to change, then there is a lot of policework to do. Personally i would really prefer if we simply keep codecs and lucene filenames as ascii-only! For non-ascii filenames, java.io.File is broken, its equals() is inconsistent with its hashCode(), even on windows, and definitely on things like macos (as i think it still uses unicode normalization to normalize filenames). We should seriously avoid the system-dependent problems that will arise by using non-ascii filenames in these parts of lucene: i don't see this bringing a lot of benefits either.
Hide
Robert Muir added a comment -

Does is also allow for deletion of still-open files?

No, but we could/should investigate trying to improve this, thoguh we might have to implement IndexOutput for it to all work.

You may want to experiment w/ different buffer sizes to see if that impacts performance.

maybe, mainly my point was to avoid the synchronized() here. A user could always tweak the buffer size here?

Also: this allows you to do the equivalent of Linux's O_DIRECT, right? If so... I think we should augment the Dir API so that on opening an input it receives some sort of IOContext or something expressing why we are opening. So for merging we would want "direct iO" (bypass OS buffer cache or NOREUSE fadvise flag, etc.), and larger buffer sizes and perhaps other low level IO settings in the future.

Yes, we can use FILE_FLAG_NO_BUFFERING, though it would be more complicated (things must be sector-aligned). In my opinion this should be a separate IndexInput, and as you stated the Dir API could inform us. But you are right, the general notion seems to be in all major platforms, and I think we should carefully use it (if it can help!)

Show
Robert Muir added a comment - Does is also allow for deletion of still-open files? No, but we could/should investigate trying to improve this, thoguh we might have to implement IndexOutput for it to all work. You may want to experiment w/ different buffer sizes to see if that impacts performance. maybe, mainly my point was to avoid the synchronized() here. A user could always tweak the buffer size here? Also: this allows you to do the equivalent of Linux's O_DIRECT, right? If so... I think we should augment the Dir API so that on opening an input it receives some sort of IOContext or something expressing why we are opening. So for merging we would want "direct iO" (bypass OS buffer cache or NOREUSE fadvise flag, etc.), and larger buffer sizes and perhaps other low level IO settings in the future. Yes, we can use FILE_FLAG_NO_BUFFERING, though it would be more complicated (things must be sector-aligned). In my opinion this should be a separate IndexInput, and as you stated the Dir API could inform us. But you are right, the general notion seems to be in all major platforms, and I think we should carefully use it (if it can help!)
Hide
Uwe Schindler added a comment -

Sorry, the whole thing was a misunderstanding.

Its ok, we understand.

Show
Uwe Schindler added a comment - Sorry, the whole thing was a misunderstanding. Its ok, we understand.
Hide
Uwe Schindler added a comment -

The Native Policeman now retires!

Show
Uwe Schindler added a comment - The Native Policeman now retires!
Hide
Native Policeman added a comment -

wink

Show
Native Policeman added a comment - wink
Hide
Uwe Schindler added a comment -

I was thinking maybe I retire as generics policeman too.

I decided we should convert all of our code back to java 1.4, generics are useless and should be abolished!

Show
Uwe Schindler added a comment - I was thinking maybe I retire as generics policeman too. I decided we should convert all of our code back to java 1.4, generics are useless and should be abolished!
Hide
Robert Muir added a comment -

I think, I will also retire as Unicode Policeman, because Unicode is a cruft fore all US americans. Its totally useless!

Show
Robert Muir added a comment - I think, I will also retire as Unicode Policeman, because Unicode is a cruft fore all US americans. Its totally useless!
Hide
DM Smith added a comment -

I've just back ported all the code to Java 1.1. Also, this port also deletes everything but 7-bit ASCII.

(Just couldn't resist....)

Show
DM Smith added a comment - I've just back ported all the code to Java 1.1. Also, this port also deletes everything but 7-bit ASCII. (Just couldn't resist....)
Hide
Robert Muir added a comment -

i turned mike's benchmark into a unit test, and toned it down a bit.

so i agree with uwe on only one thing, and thats optimizing the case where we throw an exception.

but i disagree on the other things: i think we should just have language neutral error messages and ascii-only files as i presented here... this isn't for general purpose use: its just for lucene. there is no need to bring wchar, etc into this.

so, since all tests pass and this fixes the bug in SimpleFS, i will commit to contrib/misc coon.

Show
Robert Muir added a comment - i turned mike's benchmark into a unit test, and toned it down a bit. so i agree with uwe on only one thing, and thats optimizing the case where we throw an exception. but i disagree on the other things: i think we should just have language neutral error messages and ascii-only files as i presented here... this isn't for general purpose use: its just for lucene. there is no need to bring wchar, etc into this. so, since all tests pass and this fixes the bug in SimpleFS, i will commit to contrib/misc coon.
Hide
Robert Muir added a comment -

committed revision 1041844 (trunk), 1041881 (3x).

Show
Robert Muir added a comment - committed revision 1041844 (trunk), 1041881 (3x).
Hide
Grant Ingersoll added a comment -

Bulk close for 3.1

Show
Grant Ingersoll added a comment - Bulk close for 3.1

## People

• Assignee:
Robert Muir
Reporter:
Robert Muir