Lucene - Core
  1. Lucene - Core
  2. LUCENE-4995

Remove the strong reference of CompressingStoredFieldsReader on the decompression buffer

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 4.4, 6.0
    • Component/s: None
    • Labels:
      None
    • Lucene Fields:
      New

      Description

      CompressingStoredFieldsReader has a strong reference on the buffer it uses for decompression. Although it makes the reader able to reuse this buffer, this can trigger high memory usage in case some documents are very large. Creating this buffer on demand would help give memory back to the JVM.

      1. cpuGraph_4995Patch.jpg
        40 kB
        Aditya
      2. heapGraph_4995Patch.jpg
        261 kB
        Aditya
      3. LUCENE-4995.patch
        1 kB
        Adrien Grand
      4. LUCENE-4995.patch
        1 kB
        Adrien Grand

        Activity

        Hide
        Adrien Grand added a comment -

        Patch. CompressingTermVectorsReader doesn't need the fix since it already creates the decompression buffer on demand.

        Show
        Adrien Grand added a comment - Patch. CompressingTermVectorsReader doesn't need the fix since it already creates the decompression buffer on demand.
        Hide
        Robert Muir added a comment -

        Are we sure this is the right thing to do? People have misconfigured threadpools (especially tomcat users) and complain all the time about e.g. the analyzer buffers and so on.

        I just want to make sure these misconfigured users arent causing correctly-configured users massive amounts of garbage etc.

        Show
        Robert Muir added a comment - Are we sure this is the right thing to do? People have misconfigured threadpools (especially tomcat users) and complain all the time about e.g. the analyzer buffers and so on. I just want to make sure these misconfigured users arent causing correctly-configured users massive amounts of garbage etc.
        Hide
        Adrien Grand added a comment -

        Are we sure this is the right thing to do?

        I have no idea at all. On the one hand, it looks reasonable to me to have a reusable per-thread buffer to handle decompression, but on the other hand, it makes me unhappy that its size is unbounded: if an index has a few 1M documents on S segments and T threads, the JVM will have to reserve S*T*1M of heap just to be able to handle decompression.

        Show
        Adrien Grand added a comment - Are we sure this is the right thing to do? I have no idea at all. On the one hand, it looks reasonable to me to have a reusable per-thread buffer to handle decompression, but on the other hand, it makes me unhappy that its size is unbounded: if an index has a few 1M documents on S segments and T threads, the JVM will have to reserve S*T*1M of heap just to be able to handle decompression.
        Hide
        Simon Willnauer added a comment -

        I think the biggest issue from my perspective here is that some docs might be very large and once they are loaded the thread-local will hold on to potentially way too large buffers. I mean maybe we can adjust this based on the average doc size or something here that we reuse in the common case and re-allocate in the exceptional case? just and idea though...

        Show
        Simon Willnauer added a comment - I think the biggest issue from my perspective here is that some docs might be very large and once they are loaded the thread-local will hold on to potentially way too large buffers. I mean maybe we can adjust this based on the average doc size or something here that we reuse in the common case and re-allocate in the exceptional case? just and idea though...
        Hide
        Aditya added a comment -

        We are currently facing this issue when we upgraded Solr from 3.5 to 4.x. Our documents have large multiValued fields and we do see huge Heap which is not claimed back after GC. 3.5 with same set of documents works very well and the heap usage is around 50 to 65% (on long run with no replication)

        Some specs we used about test. (no replication happening anytime during our test)
        #CPU 6 Cores
        RAM 16Gb
        Heap 8GB (tried with 10 and 12 GB too)
        Index size 4.6GB
        Solr version 4.2.1 (on Stand Alone M/c. Our Production implementation is Master/Slave)
        GC Policy used is CMS (tried G1GC too)

        On analyzing the heap we see that the largest object in the heap is the byteRef tracing it back it points to CompressedStoredFieldReader. We are looking into the patch above and test if that helps

        if you need more information. I can provide.

        thanks
        Aditya

        Show
        Aditya added a comment - We are currently facing this issue when we upgraded Solr from 3.5 to 4.x. Our documents have large multiValued fields and we do see huge Heap which is not claimed back after GC. 3.5 with same set of documents works very well and the heap usage is around 50 to 65% (on long run with no replication) Some specs we used about test. (no replication happening anytime during our test) #CPU 6 Cores RAM 16Gb Heap 8GB (tried with 10 and 12 GB too) Index size 4.6GB Solr version 4.2.1 (on Stand Alone M/c. Our Production implementation is Master/Slave) GC Policy used is CMS (tried G1GC too) On analyzing the heap we see that the largest object in the heap is the byteRef tracing it back it points to CompressedStoredFieldReader. We are looking into the patch above and test if that helps if you need more information. I can provide. thanks Aditya
        Hide
        Adrien Grand added a comment -

        Thanks Aditya. Your experiments with the patch would be very useful! Additionally, do you know how large (in bytes) are your largest documents?

        Show
        Adrien Grand added a comment - Thanks Aditya. Your experiments with the patch would be very useful! Additionally, do you know how large (in bytes) are your largest documents?
        Hide
        Aditya added a comment -

        Sure will update. Our "Killer Doc" (that's what we call it these days) is of size 25.3MB when saved from solr response for all fields with javabin writer.
        This is our largest document in the index.

        Show
        Aditya added a comment - Sure will update. Our "Killer Doc" (that's what we call it these days) is of size 25.3MB when saved from solr response for all fields with javabin writer. This is our largest document in the index.
        Hide
        Aditya added a comment - - edited

        Quick update on our test. We did a test for around 1 hour and it seams that this patch is working. GC does re-claim all the memory back to acceptable margin. (Test duration 17:20 to 18:35)
        I have attached CPU and heap Graph for our 1 hour test (8GB Heap 200qps only document cache with max size 4096 see hit ratio 89%)
        We are performing some more configuration changes at JVM and Solr Cache size and run full night test to see if the server keeps up.

        Two concerns that i see is
        1. Frequent GCs
        2. Heap is used up very quick.

        Any advice to improve, please let me know.

        Show
        Aditya added a comment - - edited Quick update on our test. We did a test for around 1 hour and it seams that this patch is working. GC does re-claim all the memory back to acceptable margin. (Test duration 17:20 to 18:35) I have attached CPU and heap Graph for our 1 hour test (8GB Heap 200qps only document cache with max size 4096 see hit ratio 89%) We are performing some more configuration changes at JVM and Solr Cache size and run full night test to see if the server keeps up. Two concerns that i see is 1. Frequent GCs 2. Heap is used up very quick. Any advice to improve, please let me know.
        Hide
        Adrien Grand added a comment -

        Unfortunately, a buffer is needed to decompress data so I think I am going to stick to Simon's proposition and keep the per-CompressingStoredFieldsReader buffer around but only use it for reasonably small documents (I am currently thinking of a threshold of ~ 32kb, twice the block size). I will attach a new patch soon.

        Show
        Adrien Grand added a comment - Unfortunately, a buffer is needed to decompress data so I think I am going to stick to Simon's proposition and keep the per-CompressingStoredFieldsReader buffer around but only use it for reasonably small documents (I am currently thinking of a threshold of ~ 32kb, twice the block size). I will attach a new patch soon.
        Hide
        Adrien Grand added a comment -

        Here is a patch which only reuses the buffer when there is less than 32kb of data to decompress.

        Show
        Adrien Grand added a comment - Here is a patch which only reuses the buffer when there is less than 32kb of data to decompress.
        Hide
        Commit Tag Bot added a comment -

        [trunk commit] jpountz
        http://svn.apache.org/viewvc?view=revision&revision=1491374

        LUCENE-4995: Don't hold references on large decompression buffers in CompressingStoredFieldsReader.

        CompressingStoredFieldsReader now only reuses an internal buffer when there is
        no more than 32kb to decompress. For large blocks, a dedicated decompression
        buffer will be created on demand.

        Show
        Commit Tag Bot added a comment - [trunk commit] jpountz http://svn.apache.org/viewvc?view=revision&revision=1491374 LUCENE-4995 : Don't hold references on large decompression buffers in CompressingStoredFieldsReader. CompressingStoredFieldsReader now only reuses an internal buffer when there is no more than 32kb to decompress. For large blocks, a dedicated decompression buffer will be created on demand.
        Hide
        Commit Tag Bot added a comment -

        [trunk commit] jpountz
        http://svn.apache.org/viewvc?view=revision&revision=1491376

        LUCENE-4995: Added CHANGES entry.

        Show
        Commit Tag Bot added a comment - [trunk commit] jpountz http://svn.apache.org/viewvc?view=revision&revision=1491376 LUCENE-4995 : Added CHANGES entry.
        Hide
        Commit Tag Bot added a comment -

        [branch_4x commit] jpountz
        http://svn.apache.org/viewvc?view=revision&revision=1491378

        LUCENE-4995: Don't hold references on large decompression buffers in CompressingStoredFieldsReader (merged from r1491374).

        CompressingStoredFieldsReader now only reuses an internal buffer when there is
        no more than 32kb to decompress. For large blocks, a dedicated decompression
        buffer will be created on demand.

        Show
        Commit Tag Bot added a comment - [branch_4x commit] jpountz http://svn.apache.org/viewvc?view=revision&revision=1491378 LUCENE-4995 : Don't hold references on large decompression buffers in CompressingStoredFieldsReader (merged from r1491374). CompressingStoredFieldsReader now only reuses an internal buffer when there is no more than 32kb to decompress. For large blocks, a dedicated decompression buffer will be created on demand.
        Hide
        Steve Rowe added a comment -

        Bulk close resolved 4.4 issues

        Show
        Steve Rowe added a comment - Bulk close resolved 4.4 issues

          People

          • Assignee:
            Adrien Grand
            Reporter:
            Adrien Grand
          • Votes:
            1 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development