Uploaded image for project: 'Solr'
  1. Solr
  2. SOLR-3314

DIH with multi-threading throws exception

    XMLWordPrintableJSON

    Details

    • Type: Bug
    • Status: Closed
    • Priority: Major
    • Resolution: Duplicate
    • Affects Version/s: 3.6
    • Fix Version/s: 3.6
    • Labels:
      None

      Description

      While loading with DIH in multi-threading mode there are sometimes exceptions.

      Apr 4, 2012 10:19:10 AM org.apache.solr.common.SolrException log
      SEVERE: Full Import failed:java.lang.ClassCastException: java.util.ArrayList cannot be cast to java.lang.String
      	at org.apache.solr.common.util.NamedList.getName(NamedList.java:131)
      	at org.apache.solr.common.util.NamedList.toString(NamedList.java:258)
      	at java.lang.String.valueOf(String.java:2826)
      	at java.lang.StringBuilder.append(StringBuilder.java:115)
      	at org.apache.solr.update.processor.LogUpdateProcessor.finish(LogUpdateProcessorFactory.java:188)
      	at org.apache.solr.update.processor.UpdateRequestProcessor.finish(UpdateRequestProcessor.java:78)
      	at org.apache.solr.handler.dataimport.SolrWriter.close(SolrWriter.java:53)
      	at org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:268)
      	at org.apache.solr.handler.dataimport.DataImporter.doFullImport(DataImporter.java:375)
      	at org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.java:445)
      	at org.apache.solr.handler.dataimport.DataImporter$3.run(DataImporter.java:426)
      
      Apr 4, 2012 10:19:10 AM org.apache.solr.update.DirectUpdateHandler2 rollback
      INFO: start rollback
      Apr 4, 2012 10:19:10 AM org.apache.solr.update.DirectUpdateHandler2 rollback
      INFO: end_rollback
      

      Analysis:
      After loading the LogUpdateProcessor produces the logs by writing the content of "toLog" and the elapsed time.

          log.info( "" + toLog + " 0 " + (elapsed) );
      

      "toLog" is a NamedList of org.apache.solr.common.util.NamedList which will be prepared for printing with methods "toString", "getName" and "getVal". The NamedList consists of name/value pairs, where the name must always be a String. As the exceptions points out it somehow happens that the name can be an ArrayList.

      To trace this further down I modified org.apache.solr.common.util.NamedList the method "getName" as following:

        public String getName(int idx) {
          if (nvPairs.get(idx << 1).getClass().getName().equals("java.util.ArrayList")) {
            System.out.println( "<Object>>" + nvPairs.get(idx << 1).toString() + "<" );
          }
          return (String)nvPairs.get(idx << 1);
        }
      

      After several tries I could procude an exception and the output was:

      <Object>>[testdir2_testfile2_record2, testdir2_testfile2_record3, testdir2_testfile2_record2, testdir2_testfile2_record1, testdir2_testfile2_record3, testdir2_testfile2_record1, testdir2_testfile2_record1, testdir2_testfile2_record2, ... (24 adds)]<
      

      What we see here is:

      • we have 2 files in 2 directories each of 3 records but it reports "24 adds", while the index afterwards only has the 6 records (self-healing by uniq IDs in the index)
      • the record IDs are multiple times in the ArrayList

      As a matter of fact something is not thread-safe. The "LogUpdateProcessorFactory"???

      I have no idea how to provide a unit test for this one as it is only in DIH multi-theading mode and only sometimes.
      Nevertheless it would be bad to have a rollback after loading some million records

        Attachments

          Issue Links

            Activity

              People

              • Assignee:
                jdyer James Dyer
                Reporter:
                befehl Bernd Fehling
              • Votes:
                0 Vote for this issue
                Watchers:
                1 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: