Uploaded image for project: 'Lucene - Core'
  1. Lucene - Core
  2. LUCENE-7647

CompressingStoredFieldsFormat should reclaim memory more aggressively

    Details

    • Type: Bug
    • Status: Closed
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 5.5.4, 6.5, 6.4.1, 7.0
    • Component/s: None
    • Labels:
      None
    • Lucene Fields:
      New

      Description

      When stored fields are configured with BEST_COMPRESSION, we rely on garbage collection to reclaim Deflater/Inflater instances. However these classes use little JVM memory but may use significant native memory, so if may happen that the OS runs out of native memory before the JVM collects these unreachable Deflater/Inflater instances. We should look into reclaiming native memory more aggressively.

        Activity

        Hide
        jpountz Adrien Grand added a comment -

        Here is a patch. On the writing side, things are easy since there is a single instance that is used from a single thread and for a short amount of time, so I just made the compressor implement Closeable. However things are a bit more complicated on the reading side because of clones and the fact that we do not close them all. So to keep things simple, I just changed the codec to create Inflater instances on demand.

        Show
        jpountz Adrien Grand added a comment - Here is a patch. On the writing side, things are easy since there is a single instance that is used from a single thread and for a short amount of time, so I just made the compressor implement Closeable. However things are a bit more complicated on the reading side because of clones and the fact that we do not close them all. So to keep things simple, I just changed the codec to create Inflater instances on demand.
        Hide
        rcmuir Robert Muir added a comment -

        What does it do to performance to create a deflater instance every time? This seems very inefficient.

        Show
        rcmuir Robert Muir added a comment - What does it do to performance to create a deflater instance every time? This seems very inefficient.
        Hide
        rcmuir Robert Muir added a comment -

        s/deflater/inflater of course

        Show
        rcmuir Robert Muir added a comment - s/deflater/inflater of course
        Hide
        jpountz Adrien Grand added a comment -

        For fetching the top hits I think it is fine anyway, if there is an issue I suspect it would be more with merging. I can try to run luceneutil with this change next week. Do you have ideas to make it more efficient maybe?

        Show
        jpountz Adrien Grand added a comment - For fetching the top hits I think it is fine anyway, if there is an issue I suspect it would be more with merging. I can try to run luceneutil with this change next week. Do you have ideas to make it more efficient maybe?
        Hide
        rcmuir Robert Muir added a comment -

        I think its first important to understand how it impacts performance, including worst cases. That means merging with deletes and lots of results and stuff too: not just best-cases like top hits only.

        Alternative solutions are possible depending on the impact: e.g. pool managed by the top Decompressor and passed via clone(), and decompress could simply release back to the pool. This is kind of a standard pattern, but of course it adds complexity. We should avoid it if its really not necessary.

        Show
        rcmuir Robert Muir added a comment - I think its first important to understand how it impacts performance, including worst cases. That means merging with deletes and lots of results and stuff too: not just best-cases like top hits only. Alternative solutions are possible depending on the impact: e.g. pool managed by the top Decompressor and passed via clone(), and decompress could simply release back to the pool. This is kind of a standard pattern, but of course it adds complexity. We should avoid it if its really not necessary.
        Hide
        jpountz Adrien Grand added a comment -

        I ran a merge that cantains 1M documents from the wikipedia benchmark including deleted docs, in order to test the worst case. Here is what the info stream reports about stored fields before/after the change:

        Before:

        SM 0 [2017-01-23T15:03:34.956Z; Lucene Merge Thread #0]: 41827 msec to merge stored fields [996093 docs]
        SM 0 [2017-01-23T15:06:49.785Z; Lucene Merge Thread #0]: 41722 msec to merge stored fields [996093 docs]
        SM 0 [2017-01-23T15:14:09.943Z; Lucene Merge Thread #0]: 42138 msec to merge stored fields [996093 docs]
        

        After:

        SM 0 [2017-01-23T15:17:33.241Z; Lucene Merge Thread #0]: 42050 msec to merge stored fields [996093 docs]
        SM 0 [2017-01-23T15:20:00.656Z; Lucene Merge Thread #0]: 42320 msec to merge stored fields [996093 docs]
        SM 0 [2017-01-23T15:22:04.047Z; Lucene Merge Thread #0]: 42520 msec to merge stored fields [996093 docs]
        

        I think this is either noise a an acceptable slow down. That makes sense since we always decompress about 16K of data. Initialization of the Inflater is likely much less costly than decompressing that amount of data.

        Show
        jpountz Adrien Grand added a comment - I ran a merge that cantains 1M documents from the wikipedia benchmark including deleted docs, in order to test the worst case. Here is what the info stream reports about stored fields before/after the change: Before: SM 0 [2017-01-23T15:03:34.956Z; Lucene Merge Thread #0]: 41827 msec to merge stored fields [996093 docs] SM 0 [2017-01-23T15:06:49.785Z; Lucene Merge Thread #0]: 41722 msec to merge stored fields [996093 docs] SM 0 [2017-01-23T15:14:09.943Z; Lucene Merge Thread #0]: 42138 msec to merge stored fields [996093 docs] After: SM 0 [2017-01-23T15:17:33.241Z; Lucene Merge Thread #0]: 42050 msec to merge stored fields [996093 docs] SM 0 [2017-01-23T15:20:00.656Z; Lucene Merge Thread #0]: 42320 msec to merge stored fields [996093 docs] SM 0 [2017-01-23T15:22:04.047Z; Lucene Merge Thread #0]: 42520 msec to merge stored fields [996093 docs] I think this is either noise a an acceptable slow down. That makes sense since we always decompress about 16K of data. Initialization of the Inflater is likely much less costly than decompressing that amount of data.
        Hide
        mikemccand Michael McCandless added a comment -

        +1

        Show
        mikemccand Michael McCandless added a comment - +1
        Hide
        jira-bot ASF subversion and git services added a comment -

        Commit 94530940e4de8b476a5886f284578c933a8f33ef in lucene-solr's branch refs/heads/master from Adrien Grand
        [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=9453094 ]

        LUCENE-7647: CompressingStoredFieldsFormat should reclaim memory more aggressively.

        Show
        jira-bot ASF subversion and git services added a comment - Commit 94530940e4de8b476a5886f284578c933a8f33ef in lucene-solr's branch refs/heads/master from Adrien Grand [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=9453094 ] LUCENE-7647 : CompressingStoredFieldsFormat should reclaim memory more aggressively.
        Hide
        rcmuir Robert Muir added a comment -

        Thanks for running the benchmark!

        Show
        rcmuir Robert Muir added a comment - Thanks for running the benchmark!
        Hide
        jira-bot ASF subversion and git services added a comment -

        Commit 03448807a1b14657bdb8eb568f84df3d6ef09e01 in lucene-solr's branch refs/heads/branch_6x from Adrien Grand
        [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=0344880 ]

        LUCENE-7647: CompressingStoredFieldsFormat should reclaim memory more aggressively.

        Show
        jira-bot ASF subversion and git services added a comment - Commit 03448807a1b14657bdb8eb568f84df3d6ef09e01 in lucene-solr's branch refs/heads/branch_6x from Adrien Grand [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=0344880 ] LUCENE-7647 : CompressingStoredFieldsFormat should reclaim memory more aggressively.
        Hide
        jira-bot ASF subversion and git services added a comment -

        Commit ff9cc1d8090c8e8cbc7ec22b50c156fafad8e6f3 in lucene-solr's branch refs/heads/branch_6_4 from Adrien Grand
        [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=ff9cc1d ]

        LUCENE-7647: CompressingStoredFieldsFormat should reclaim memory more aggressively.

        Show
        jira-bot ASF subversion and git services added a comment - Commit ff9cc1d8090c8e8cbc7ec22b50c156fafad8e6f3 in lucene-solr's branch refs/heads/branch_6_4 from Adrien Grand [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=ff9cc1d ] LUCENE-7647 : CompressingStoredFieldsFormat should reclaim memory more aggressively.
        Hide
        jira-bot ASF subversion and git services added a comment -

        Commit 94530940e4de8b476a5886f284578c933a8f33ef in lucene-solr's branch refs/heads/apiv2 from Adrien Grand
        [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=9453094 ]

        LUCENE-7647: CompressingStoredFieldsFormat should reclaim memory more aggressively.

        Show
        jira-bot ASF subversion and git services added a comment - Commit 94530940e4de8b476a5886f284578c933a8f33ef in lucene-solr's branch refs/heads/apiv2 from Adrien Grand [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=9453094 ] LUCENE-7647 : CompressingStoredFieldsFormat should reclaim memory more aggressively.
        Hide
        jira-bot ASF subversion and git services added a comment -

        Commit 129624e5fbfdd6fa4a9904189caf416dbf6412ad in lucene-solr's branch refs/heads/branch_5_5 from Adrien Grand
        [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=129624e ]

        LUCENE-7647: CompressingStoredFieldsFormat should reclaim memory more aggressively.

        Show
        jira-bot ASF subversion and git services added a comment - Commit 129624e5fbfdd6fa4a9904189caf416dbf6412ad in lucene-solr's branch refs/heads/branch_5_5 from Adrien Grand [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=129624e ] LUCENE-7647 : CompressingStoredFieldsFormat should reclaim memory more aggressively.

          People

          • Assignee:
            Unassigned
            Reporter:
            jpountz Adrien Grand
          • Votes:
            0 Vote for this issue
            Watchers:
            5 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development