[LUCENENET-629] Lucene & Memory Mapped Files - ASF JIRA

Details

Type: Improvement
Status: Closed
Priority: Minor
Resolution: Abandoned
Affects Version/s: Lucene.Net 4.8.0
Fix Version/s: None
Component/s: Lucene.Net Core
Labels:
- up-for-grabs

Description

This came in on the user mailing list on 15-July-2019 and was originally reported by Vincent Van Den Berghe (Vincent.VanDenBerghe@bvdinfo.com)

Hello everyone,

I've just had an interesting performance debugging session, and one of the things I've learned is probably applicable for Lucene.NET.

I'll give it here with no guarantees, hoping that it might be useful to someone.

Lucene uses memory mapped files for reading, most notably via MemoryMappedFileByteBuffer. Profiling indicated that there are 2 calls that have quite some overhead:

        public override ByteBuffer Get(byte[] dst, int offset, int length)

        public override byte Get()

These calls spend their time in 2 methods of MemoryMappedViewAccessor:

public int ReadArray<T>(long position, T[] array, int offset, int count) where T : struct; public byte ReadByte(long position);

The implementation of both contains a lot of overhead, especially ReadArray<T>: apart from the parameter validation, this method makes sure that the generic parameter T is properly aligned. This is irrelevant in our use case, since T is byte. But because the method implementation doesn't make any assumptions on T (other than the fact that is must be a value type, which is the generic constraint), every call goes through the same motions, every time.

Microsoft should have provided specializations for common value types, and certainly for byte arrays. Sadly, this is not the case.

The other one, ReadByte, acquires and releases the (unsafe) pointer before derefencing it to return one single byte.

A way to do this more efficiently (while avoiding unsafe code), is to acquire the pointer handle associated with the view accessor, and use that pointer to marshal information back to the caller.

To do this, MemoryMappedFileByteBuffer needs one extra member variable to hold the address:

       private long m_Ptr;

Then, the 2 MemoryMappedFileByteBuffer constructors need to be rewritten as follows (mainly to avoid code duplication):

              public MemoryMappedFileByteBuffer(MemoryMappedViewAccessor accessor, int capacity)

                           : this(accessor, capacity, 0)



Unknown macro: {               }

              public MemoryMappedFileByteBuffer(MemoryMappedViewAccessor accessor, int capacity, int offset)

                     : base(capacity)

            {

                     this.accessor = accessor;

                     this.offset = offset;

                     System.Runtime.CompilerServices.RuntimeHelpers.PrepareConstrainedRegions();

                     try



Unknown macro: {                      }

                     finally



Unknown macro: {                            bool success = false;                            accessor.SafeMemoryMappedViewHandle.DangerousAddRef(ref success);                            m_Ptr = accessor.SafeMemoryMappedViewHandle.DangerousGetHandle().ToInt64() + accessor.PointerOffset;                      }

              }

The only thing this does is getting the pointer handle. Yes, the method has the word "Dangerous" in it, but it's perfectly safe . Note that this needs .NET version 4.5.1 or later, because we want the starting position of the view from the beginning of the memory mapped file through the PointerOffset property which is unavailable in earlier .NET releases.

What the constructor does is to get a 64-bit quantity representing the start of the memory mapped view. The special construct with an "empty try block" conforms to the documentation regarding constrained execution regions (although I think it's more of a cargo-cult thing, since constrained execution doesn't solve a lot of problems in this case).

Finally, the Dispose method needs to be extended to release the pointer handle using DangerousRelease:

        public void Dispose()

        {

            if (accessor != null)



Unknown macro: {               accessor.SafeMemoryMappedViewHandle.DangerousRelease();               accessor.Dispose();               accessor = null;             }

        }

At this point, we can replace the ReadArray in ByteBuffer Get by this:

Marshal.Copy(new IntPtr(m_Ptr + Ix(NextGetIndex(length))), dst, offset, length);

And the ReadByte method becomes:

        public override byte Get()



Unknown macro: {               return Marshal.ReadByte(new IntPtr(m_Ptr + Ix(NextGetIndex())));         }

The Marshal class contains various read method to read various data types (ReadInt16, ReadInt32), and it would be possible to rewrite all other methods that currently assemble the types byte-per-byte. This is left as an exercise for the reader. In any case, these methods have a lot less overhead than the corresponding methods in the memory view accessor.

In my measurements, even when files reside on slow devices, the performance improvements are noticeable: I'm seeing improvements of 5%, especially for large segments. If you have slow I/O, the slow I/O still dominates, of course: no such thing as a free lunch and all that.

As I said, no guarantees. Have fun with it! If you find something that is unacceptable, let me know.

Vincent

Attachments

Issue Links

Is contained by

LUCENENET-630 Identify/Fix Bottlenecks

Closed

Lucene & Memory Mapped Files

Details

Description

Attachments

Issue Links

Activity

People

Dates