Solr
  1. Solr
  2. SOLR-7843

Importing Delta create a memory leak

    Details

      Description

      The org.apache.solr.handler.dataimport.SolrWriter is not correctly cleaning itself after finishing importing Deltas as the "Set<Object> deltaKeys" is not being cleaned after the process has finished.

      When using a custom importer or DataSource for my case I need to add additional parameters to the delta keys.

      When the data import finishes the DeltaKeys is not set back to null and the DataImporter, DocBuilder and the SolrWriter are mantained as live objects because there are being referenced by the "infoRegistry" of the SolrCore which seems to be used for Jmx information.

      It appears that starting a second delta import did not freed the memory which may cause on the long run an OutOfMemory, I have not checked if starting a full import would break the references and free the memory.

      An easy fix is possible which would be to add to the SolrWriter "deltaKeys = null;" on the close method.
      Or nullify the writer on DocBuilder after being used on the method execute();

        Activity

        Hide
        Joseph Lawson added a comment -

        Why is this not a problem?

        Show
        Joseph Lawson added a comment - Why is this not a problem?
        Hide
        ASF subversion and git services added a comment -

        Commit 1710078 from shalin@apache.org in branch 'dev/trunk'
        [ https://svn.apache.org/r1710078 ]

        SOLR-7843: DataImportHandler's delta imports leak memory because the delta keys are kept in memory and not cleared after the process is finished

        Show
        ASF subversion and git services added a comment - Commit 1710078 from shalin@apache.org in branch 'dev/trunk' [ https://svn.apache.org/r1710078 ] SOLR-7843 : DataImportHandler's delta imports leak memory because the delta keys are kept in memory and not cleared after the process is finished
        Hide
        ASF subversion and git services added a comment -

        Commit 1710079 from shalin@apache.org in branch 'dev/branches/branch_5x'
        [ https://svn.apache.org/r1710079 ]

        SOLR-7843: DataImportHandler's delta imports leak memory because the delta keys are kept in memory and not cleared after the process is finished

        Show
        ASF subversion and git services added a comment - Commit 1710079 from shalin@apache.org in branch 'dev/branches/branch_5x' [ https://svn.apache.org/r1710079 ] SOLR-7843 : DataImportHandler's delta imports leak memory because the delta keys are kept in memory and not cleared after the process is finished
        Hide
        Shalin Shekhar Mangar added a comment -

        Thanks for the nudge, Joseph and to Pablo for reporting. This fix will be released in 5.4.

        Show
        Shalin Shekhar Mangar added a comment - Thanks for the nudge, Joseph and to Pablo for reporting. This fix will be released in 5.4.
        Hide
        Joseph Lawson added a comment -

        Does this affect 5.3+ as well? If I'm using DIH I'm currently assuming 5.2.0 is the only safe version. Is that a correct assumption?

        Show
        Joseph Lawson added a comment - Does this affect 5.3+ as well? If I'm using DIH I'm currently assuming 5.2.0 is the only safe version. Is that a correct assumption?
        Hide
        Shalin Shekhar Mangar added a comment -

        Joseph Lawson - yes, I'd expect that 5.3.0 is affected as well. And, even though the bug is marked as affecting 5.2.1, I don't think 5.2 is any better either. You may need to patch and run a custom build of Solr until 5.4 is released.

        Show
        Shalin Shekhar Mangar added a comment - Joseph Lawson - yes, I'd expect that 5.3.0 is affected as well. And, even though the bug is marked as affecting 5.2.1, I don't think 5.2 is any better either. You may need to patch and run a custom build of Solr until 5.4 is released.
        Hide
        Pablo Lozano added a comment -

        Thanks for fixing it.
        I closed it before because I was starting to think it could have been my fault as I was using the import-handler on a very unorthodox way. Later on I did realize it was a real issue and it should be fixed but by that time I was using a different method of delta import that did not trigger this behavior. I forgot to reopen it again.
        This issue only happens when the delta import is huge or use the import-handler on a very unorthodox way like I did but it should definitely needed to be fixed.

        Maybe this discussion should be for other day but even thoe the import-handler is very good I think it is missing some flexibility to avoid this type of issues. I think most of the time developers would want to use it as a base and not as full fledged component. Most of the times use cases are very specific to business cases and the default implementation looks like something that has tried to adapt to all possible cases possible. This has constrained the flexibility on this plugin by adding inflexible edge cases rules, made strange abstractions and forces a very opinionated workflow.
        In my opinion this plugin should serve as a base for developers to implement their own import functionality and set of tools to help them manage the state of the import. It is easier for a developer to implement an api than try to work around a framework.

        That is just my two cents of an overall great plugin.
        Thanks

        Show
        Pablo Lozano added a comment - Thanks for fixing it. I closed it before because I was starting to think it could have been my fault as I was using the import-handler on a very unorthodox way. Later on I did realize it was a real issue and it should be fixed but by that time I was using a different method of delta import that did not trigger this behavior. I forgot to reopen it again. This issue only happens when the delta import is huge or use the import-handler on a very unorthodox way like I did but it should definitely needed to be fixed. Maybe this discussion should be for other day but even thoe the import-handler is very good I think it is missing some flexibility to avoid this type of issues. I think most of the time developers would want to use it as a base and not as full fledged component. Most of the times use cases are very specific to business cases and the default implementation looks like something that has tried to adapt to all possible cases possible. This has constrained the flexibility on this plugin by adding inflexible edge cases rules, made strange abstractions and forces a very opinionated workflow. In my opinion this plugin should serve as a base for developers to implement their own import functionality and set of tools to help them manage the state of the import. It is easier for a developer to implement an api than try to work around a framework. That is just my two cents of an overall great plugin. Thanks

          People

          • Assignee:
            Shalin Shekhar Mangar
            Reporter:
            Pablo Lozano
          • Votes:
            0 Vote for this issue
            Watchers:
            5 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development