Lucene - Core
  1. Lucene - Core
  2. LUCENE-2292

ByteBuffer Directory - allowing to store the index outside the heap

    Details

    • Type: New Feature New Feature
    • Status: Open
    • Priority: Major Major
    • Resolution: Unresolved
    • Affects Version/s: None
    • Fix Version/s: None
    • Component/s: core/store
    • Labels:
      None
    • Lucene Fields:
      New, Patch Available

      Description

      A byte buffer based directory with the benefit of being able to create direct byte buffer thus storing the index outside the JVM heap.

      1. LUCENE-2292.patch
        30 kB
        Shay Banon
      2. LUCENE-2292.patch
        30 kB
        Shay Banon
      3. LUCENE-2292.patch
        30 kB
        Shay Banon
      4. LUCENE-2292.patch
        30 kB
        Shay Banon
      5. LUCENE-2292.patch
        20 kB
        Shay Banon
      6. LUCENE-2292.patch
        18 kB
        Shay Banon

        Activity

        Hide
        Uwe Schindler added a comment -

        Hi,
        looks interesting as a replacement for RAMDirectory.

        Your patch uses a "sun." internal package. If you want to do something similar to MMapDirectory to release the buffer without waiting for GC, do it in the same way using reflection like in MMapDirectory.

        Show
        Uwe Schindler added a comment - Hi, looks interesting as a replacement for RAMDirectory. Your patch uses a "sun." internal package. If you want to do something similar to MMapDirectory to release the buffer without waiting for GC, do it in the same way using reflection like in MMapDirectory.
        Hide
        Shay Banon added a comment -

        Hi,

        >> looks interesting as a replacement for RAMDirectory.

        This class uses ByteBuffer, which has its overhead over simple byte[], though using the same logic (if you verify it) can be used to improve the concurrency in RAMDirectory (just use byte[[).

        >> Your patch uses a "sun." internal package. If you want to do something similar to MMapDirectory to release the buffer without waiting for GC, do it in the same way using reflection like in MMapDirectory.

        From what I know, it was there in all JDKs I worked with (its like sun.misc.Unsafe). Have you seen otherwise? If so, its a simple change (though I am not sure about the access control thingy in MMapDirectory, its a performance killer, and caching of the Method(s) make sense).

        Show
        Shay Banon added a comment - Hi, >> looks interesting as a replacement for RAMDirectory. This class uses ByteBuffer, which has its overhead over simple byte[], though using the same logic (if you verify it) can be used to improve the concurrency in RAMDirectory (just use byte[[). >> Your patch uses a "sun." internal package. If you want to do something similar to MMapDirectory to release the buffer without waiting for GC, do it in the same way using reflection like in MMapDirectory. From what I know, it was there in all JDKs I worked with (its like sun.misc.Unsafe). Have you seen otherwise? If so, its a simple change (though I am not sure about the access control thingy in MMapDirectory, its a performance killer, and caching of the Method(s) make sense).
        Hide
        Uwe Schindler added a comment -

        There are also other non Sun JREs on the market (IBM, Harmony,...). And e.g. the forceful unmap of MMap dirs is not working on all of them. It is simply hack. Performance is no problem for Directory as close() is seldom called, so there is no method cache in MMapDirectory.

        Show
        Uwe Schindler added a comment - There are also other non Sun JREs on the market (IBM, Harmony,...). And e.g. the forceful unmap of MMap dirs is not working on all of them. It is simply hack. Performance is no problem for Directory as close() is seldom called, so there is no method cache in MMapDirectory.
        Hide
        Shay Banon added a comment -

        Attached new patch, does not use sun.* package. I still cache Method since cleaning a buffer is not only done on close of the directory.

        Show
        Shay Banon added a comment - Attached new patch, does not use sun.* package. I still cache Method since cleaning a buffer is not only done on close of the directory.
        Hide
        Shay Banon added a comment -

        By the way, an implementation note. I thought about preallocating a large direct buffer and then splicing it into chunks, but currently I think that the complexity (and overhead in maintaining splicing locations) is not really needed and the current caching should do the trick (with the ability to control both the buffer size and the cache size).

        Show
        Shay Banon added a comment - By the way, an implementation note. I thought about preallocating a large direct buffer and then splicing it into chunks, but currently I think that the complexity (and overhead in maintaining splicing locations) is not really needed and the current caching should do the trick (with the ability to control both the buffer size and the cache size).
        Hide
        Shay Banon added a comment -

        A fixed path that now passes all tests using the byte buffer directory.

        Also, includes refactoring into a different package (store.bytebuffer), and includes a custom ByteBufferAllocator interface that can control how buffers are allocated, including plain and caching implementations.

        Show
        Shay Banon added a comment - A fixed path that now passes all tests using the byte buffer directory. Also, includes refactoring into a different package (store.bytebuffer), and includes a custom ByteBufferAllocator interface that can control how buffers are allocated, including plain and caching implementations.
        Hide
        Michael Busch added a comment -

        This class uses ByteBuffer, which has its overhead over simple byte[],

        In my experience ByteBuffer has basically no performance overhead over byte[] if you construct it by wrapping a byte[]. The JVM seems smart enough to figure out that there's a good old array behind the ByteBuffer.

        But if I allocated the BB in any other way it was 2-4x slower in my simple tests on a mac with a sun JVM.

        So it might be the right thing to put these changes into RAMDirectory and have it by default wrap a byte[] and add an (expert) API to allow allocating the BB in other ways.

        Show
        Michael Busch added a comment - This class uses ByteBuffer, which has its overhead over simple byte[], In my experience ByteBuffer has basically no performance overhead over byte[] if you construct it by wrapping a byte[]. The JVM seems smart enough to figure out that there's a good old array behind the ByteBuffer. But if I allocated the BB in any other way it was 2-4x slower in my simple tests on a mac with a sun JVM. So it might be the right thing to put these changes into RAMDirectory and have it by default wrap a byte[] and add an (expert) API to allow allocating the BB in other ways.
        Hide
        Shay Banon added a comment -

        Attaching another round of the patch, with improved ref counting on cloned index input and better EOF failures (similar to RAM one). All tests pass.

        btw. Some tests close the same index input instance several times, so I had to protect from it.

        Show
        Shay Banon added a comment - Attaching another round of the patch, with improved ref counting on cloned index input and better EOF failures (similar to RAM one). All tests pass. btw. Some tests close the same index input instance several times, so I had to protect from it.
        Hide
        Shay Banon added a comment -

        make files protected since they might be needed in classes that extend the directory (I need it in my case)

        Show
        Shay Banon added a comment - make files protected since they might be needed in classes that extend the directory (I need it in my case)
        Hide
        Shay Banon added a comment -

        Add sizeInBytes to directory

        Show
        Shay Banon added a comment - Add sizeInBytes to directory

          People

          • Assignee:
            Unassigned
            Reporter:
            Shay Banon
          • Votes:
            3 Vote for this issue
            Watchers:
            5 Start watching this issue

            Dates

            • Created:
              Updated:

              Development