Solr
  1. Solr
  2. SOLR-846

Out Of memory doing delta import with fetch size set to -1

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: 1.3
    • Fix Version/s: 1.4
    • Labels:
      None
    • Environment:

      Linux 2.6.18-92.1.13.el5xen, mysql 5.0

      Description

      Database has about 3 million records. Doing full-import there is no problem. However, when a large number of changes occurred 2558057, delta-import throws OutOfMemory error after 1288338 documents processed. The stack trace is below

      Exception in thread "Thread-3" java.lang.OutOfMemoryError: Java heap space
      at org.tartarus.snowball.ext.EnglishStemmer.<init>(EnglishStemmer.java:4
      9)
      at org.apache.solr.analysis.EnglishPorterFilter.<init>(EnglishPorterFilt
      erFactory.java:83)
      at org.apache.solr.analysis.EnglishPorterFilterFactory.create(EnglishPor
      terFilterFactory.java:66)
      at org.apache.solr.analysis.EnglishPorterFilterFactory.create(EnglishPor
      terFilterFactory.java:35)
      at org.apache.solr.analysis.TokenizerChain.tokenStream(TokenizerChain.ja
      va:48)
      at org.apache.solr.schema.IndexSchema$SolrIndexAnalyzer.tokenStream(Inde
      xSchema.java:348)
      at org.apache.lucene.analysis.Analyzer.reusableTokenStream(Analyzer.java
      :44)
      at org.apache.lucene.index.DocInverterPerField.processFields(DocInverter
      PerField.java:117)
      at org.apache.lucene.index.DocFieldConsumersPerField.processFields(DocFi
      eldConsumersPerField.java:36)
      at org.apache.lucene.index.DocFieldProcessorPerThread.processDocument(Do
      cFieldProcessorPerThread.java:234)
      at org.apache.lucene.index.DocumentsWriter.updateDocument(DocumentsWrite
      r.java:765)
      at org.apache.lucene.index.DocumentsWriter.updateDocument(DocumentsWrite
      r.java:748)
      at org.apache.lucene.index.IndexWriter.updateDocument(IndexWriter.java:2
      118)
      at org.apache.lucene.index.IndexWriter.updateDocument(IndexWriter.java:2
      095)
      at org.apache.solr.update.DirectUpdateHandler2.addDoc(DirectUpdateHandle
      r2.java:232)
      at org.apache.solr.update.processor.RunUpdateProcessor.processAdd(RunUpd
      ateProcessorFactory.java:59)
      at org.apache.solr.handler.dataimport.SolrWriter.upload(SolrWriter.java:
      69)
      at org.apache.solr.handler.dataimport.DataImportHandler$1.upload(DataImp
      ortHandler.java:288)
      at org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilde
      r.java:319)
      at org.apache.solr.handler.dataimport.DocBuilder.doDelta(DocBuilder.java
      :211)
      at org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java
      :133)
      at org.apache.solr.handler.dataimport.DataImporter.doDeltaImport(DataImp
      orter.java:359)
      at org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.j
      ava:388)
      at org.apache.solr.handler.dataimport.DataImporter$1.run(DataImporter.ja
      va:377)

      dataSource in data-config.xml has been with the batchSize of "-1".
      <dataSource driver="com.mysql.jdbc.Driver" url="jdbc:mysql://host/dbname"
      user="" password="" batchSize="-1"/>

        Activity

        Hide
        Hoss Man added a comment -

        this issue was listed as "fixed" in Solr 1.4's CHANGES.txt.

        in light of that, i'm resolving, and any future work to improve things can be done in a new issue (since a new issue id would be neccessary to properly track it in CHANGES.txt anyway)

        Show
        Hoss Man added a comment - this issue was listed as "fixed" in Solr 1.4's CHANGES.txt. in light of that, i'm resolving, and any future work to improve things can be done in a new issue (since a new issue id would be neccessary to properly track it in CHANGES.txt anyway)
        Hide
        Robert Muir added a comment -

        rmuir20120906-bulk-40-change

        Show
        Robert Muir added a comment - rmuir20120906-bulk-40-change
        Hide
        Hoss Man added a comment -

        bulk fixing the version info for 4.0-ALPHA and 4.0 all affected issues have "hoss20120711-bulk-40-change" in comment

        Show
        Hoss Man added a comment - bulk fixing the version info for 4.0-ALPHA and 4.0 all affected issues have "hoss20120711-bulk-40-change" in comment
        Hide
        Hoss Man added a comment -

        Bulk changing fixVersion 3.6 to 4.0 for any open issues that are unassigned and have not been updated since March 19.

        Email spam suppressed for this bulk edit; search for hoss20120323nofix36 to identify all issues edited

        Show
        Hoss Man added a comment - Bulk changing fixVersion 3.6 to 4.0 for any open issues that are unassigned and have not been updated since March 19. Email spam suppressed for this bulk edit; search for hoss20120323nofix36 to identify all issues edited
        Hide
        Robert Muir added a comment -

        3.4 -> 3.5

        Show
        Robert Muir added a comment - 3.4 -> 3.5
        Hide
        Robert Muir added a comment -

        Bulk move 3.2 -> 3.3

        Show
        Robert Muir added a comment - Bulk move 3.2 -> 3.3
        Hide
        Hoss Man added a comment -

        Bulk updating 240 Solr issues to set the Fix Version to "next" per the process outlined in this email...

        http://mail-archives.apache.org/mod_mbox/lucene-dev/201005.mbox/%3Calpine.DEB.1.10.1005251052040.24672@radix.cryptio.net%3E

        Selection criteria was "Unresolved" with a Fix Version of 1.5, 1.6, 3.1, or 4.0. email notifications were suppressed.

        A unique token for finding these 240 issues in the future: hossversioncleanup20100527

        Show
        Hoss Man added a comment - Bulk updating 240 Solr issues to set the Fix Version to "next" per the process outlined in this email... http://mail-archives.apache.org/mod_mbox/lucene-dev/201005.mbox/%3Calpine.DEB.1.10.1005251052040.24672@radix.cryptio.net%3E Selection criteria was "Unresolved" with a Fix Version of 1.5, 1.6, 3.1, or 4.0. email notifications were suppressed. A unique token for finding these 240 issues in the future: hossversioncleanup20100527
        Hide
        Shalin Shekhar Mangar added a comment -

        Marking for 1.5

        Show
        Shalin Shekhar Mangar added a comment - Marking for 1.5
        Hide
        Shalin Shekhar Mangar added a comment -

        Committed revision 725627.

        I've committed Noble's patch, however as he noted, it is only a partial solution. I'm in favor of streaming it however that will be an invasive change. Let's keep this issue open until we can implement a better solution.

        Show
        Shalin Shekhar Mangar added a comment - Committed revision 725627. I've committed Noble's patch, however as he noted, it is only a partial solution. I'm in favor of streaming it however that will be an invasive change. Let's keep this issue open until we can implement a better solution.
        Hide
        Noble Paul added a comment -

        a partial solution.
        eventually we must stream it or persist the data

        Show
        Noble Paul added a comment - a partial solution. eventually we must stream it or persist the data
        Hide
        Noble Paul added a comment -

        This was a known issue. DIH collects the delta row id (in memory) first and then run the import. We did not stream it because too many modified rows is uncommon and usually the delta query only fetches the pk field

        Anyway we need to fix that

        Show
        Noble Paul added a comment - This was a known issue. DIH collects the delta row id (in memory) first and then run the import. We did not stream it because too many modified rows is uncommon and usually the delta query only fetches the pk field Anyway we need to fix that

          People

          • Assignee:
            Unassigned
            Reporter:
            Ricky Leung
          • Votes:
            1 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development