Solr
  1. Solr
  2. SOLR-1623

Solr hangs (often throwing java.lang.OutOfMemoryError: PermGen space) when indexing many different field names

    Details

    • Type: Bug Bug
    • Status: Reopened
    • Priority: Critical Critical
    • Resolution: Unresolved
    • Affects Version/s: 1.3, 1.4
    • Fix Version/s: None
    • Component/s: update
    • Labels:
      None
    • Environment:

      Description

      With the following fields in schema.xml:

      <fields>
      <field name="id" type="sint" indexed="true" stored="true" required="true" />
      <dynamicField name="weight_*" type="sint" indexed="true" stored="true"/>
      </fields>

      Run the following code:

      import java.util.ArrayList;
      import java.util.List;
      import org.apache.solr.client.solrj.SolrServer;
      import org.apache.solr.client.solrj.impl.CommonsHttpSolrServer;
      import org.apache.solr.common.SolrInputDocument;

      public static void main(String[] args) throws Exception {
      SolrServer server;
      try

      { server = new CommonsHttpSolrServer(args[0]); }

      catch (Exception e)

      { System.err.println("can't creater server using: " + args[0] + " " + e.getMessage()); throw e; }

      for (int i = 0; i < 1000; i++) {
      List<SolrInputDocument> batchedDocs = new ArrayList<SolrInputDocument>();
      for (int j = 0; j < 1000; j++)

      { SolrInputDocument doc = new SolrInputDocument(); doc.addField("id", i * 1000 + j); // hangs after 30 to 50 batches doc.addField("weight_aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa" + Integer.toString(i) + "_" + Integer.toString(j), i * 1000 + j); // hangs after about 200 batches //doc.addField("weight_" + Integer.toString(i) + "_" + Integer.toString(j), i * 1000 + j); batchedDocs.add(doc); }

      try

      { server.add(batchedDocs, true); System.err.println("Done with batch=" + i); // server.commit(); //doesn't change anything }

      catch (Exception e)

      { System.err.println("batchId=" + i + " bad batch: " + e.getMessage()); throw e; }

      }
      }

      And soon the client (sometime throws) and solr will freeze. sometime you can see: java.lang.OutOfMemoryError: PermGen space in the server logs

        Activity

        Hide
        Yonik Seeley added a comment -

        This is most likely due to interning of field names. If you really need that many field names, the only option right now is to increase the size of the perm gen.

        Show
        Yonik Seeley added a comment - This is most likely due to interning of field names. If you really need that many field names, the only option right now is to increase the size of the perm gen.
        Hide
        Mark Miller added a comment -

        Whats odd is that he has it marked as affects 1.4 as well - but that doesn't intern to perm gen anymore? Did you really test with 1.4? Or am I missing something?

        You can also turn on gc for the perm gen space - not a complete solution, but it can help under the right circumstances (likely in combination with a larger perm gen space)).

        Show
        Mark Miller added a comment - Whats odd is that he has it marked as affects 1.4 as well - but that doesn't intern to perm gen anymore? Did you really test with 1.4? Or am I missing something? You can also turn on gc for the perm gen space - not a complete solution, but it can help under the right circumstances (likely in combination with a larger perm gen space)).
        Hide
        Yonik Seeley added a comment -

        Whats odd is that he has it marked as affects 1.4 as well - but that doesn't intern to perm gen anymore?

        The default StringHelper.intern() from Lucene is just a cache - String.intern() is still called.

        Show
        Yonik Seeley added a comment - Whats odd is that he has it marked as affects 1.4 as well - but that doesn't intern to perm gen anymore? The default StringHelper.intern() from Lucene is just a cache - String.intern() is still called.
        Hide
        Laurent Chavet added a comment -

        Yes this definitely repros in 1.4.

        Unfortunately I think I need a lot of fields; here is what I am trying to do:

        I want to store news articles and extract many topics for each story with a score for each topic for each story.

        So for example a story migh have a topic of Crime with a score of 20.

        So what I am doing now is store:

        Field:Topic Value:Crime indexed="true" stored="true" (need to searched and retrieved)
        Field:Weight_Topic_Crime Value:20 indexed="true" stored="true" (needs to be sorted and retrieved)

        Because there can be a lot of different value for the field topic; with this schema we end up with a lot of fields starting with weight.

        Any suggestion on how to achieve the same result in a different way?

        Thanks,

        Laurent

        Show
        Laurent Chavet added a comment - Yes this definitely repros in 1.4. Unfortunately I think I need a lot of fields; here is what I am trying to do: I want to store news articles and extract many topics for each story with a score for each topic for each story. So for example a story migh have a topic of Crime with a score of 20. So what I am doing now is store: Field:Topic Value:Crime indexed="true" stored="true" (need to searched and retrieved) Field:Weight_Topic_Crime Value:20 indexed="true" stored="true" (needs to be sorted and retrieved) Because there can be a lot of different value for the field topic; with this schema we end up with a lot of fields starting with weight. Any suggestion on how to achieve the same result in a different way? Thanks, Laurent
        Hide
        Erick Erickson added a comment -

        2013 Old JIRA cleanup

        Show
        Erick Erickson added a comment - 2013 Old JIRA cleanup

          People

          • Assignee:
            Unassigned
            Reporter:
            Laurent Chavet
          • Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

            • Created:
              Updated:

              Development