Uploaded image for project: 'Solr'
  1. Solr
  2. SOLR-13471

exceptions during response writing can cause javabin to write a corrupt/missleading response that may caus exceptions or non-sense data when unmarshaled by javabin

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Open
    • Major
    • Resolution: Unresolved
    • None
    • None
    • None
    • None

    Description

      Diagnoses/hypothosis summarized/re-worded from comment below...
      https://issues.apache.org/jira/browse/SOLR-13471?focusedCommentId=16840999&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-16840999

      • Assume request execution happens successfully
      • then, when the QueryResponseWriter goes to marshal the response, assume there is some exception writing the results – perhaps due to the Resolver w/ ResultContext + Searcher (could be an IOException)
        • the Exception from attempting to write the response may propogate all the way up to the "try" in HttpSolrCall.call() ... which once caught is then passed to "sendError"
        • sendError creates a completely new SolrQueryResponse, sets the exception on it, and asks the QueryResponseWriter to write it out
          • BUT the OutputStream has already had a bunch of data written to it ... which may be resulting in a byte sequence that confuses the unmarshal code and may result in weird exceptions – or worse: validly structured, but corrupt data

      The fundemental problem is that when HttpSolrCall to "start over" writtng the response – but the JavaBinCodec doesn't have any way to recognize that ... it can read partial data and then be confused by the "new" data that comes after it

      Perhaps there should be a special "TAG" that means "Ignore everything you've already recieved and start over" that we should emit at the begining of every marshal() call?


      Original bug report

      I haven't dug into this, or been able to reproduce it (let alone get any additional logging/debugging/breakpoint info) but in a few very sporadic, very rare, instances of running TestReplicationHandlerDiskOverFlow, I triggered ClassCastException's in JavaBinCodec unmarshal code – indicating that there is some disconnect in expectations between the marshal & unmarshal code paths...

         [junit4]   2> 13342 ERROR (Thread-19) [    ] o.a.s.h.TestReplicationHandlerDiskOverFlow Query Thread Failure
         [junit4]   2>           => org.apache.solr.client.solrj.impl.HttpSolrClient$RemoteSolrException: Error from server at http://127.0.0.1:54505/solr/collection1: class java.lang.Boolean cannot be cast to class java.lang.String (java.lang.Boolean and java.lang.String are in module java.base of loader 'bootstrap')
         [junit4]   2> 	at org.apache.solr.client.solrj.impl.HttpSolrClient.executeMethod(HttpSolrClient.java:622)
         [junit4]   2> org.apache.solr.client.solrj.impl.HttpSolrClient$RemoteSolrException: Error from server at http://127.0.0.1:54505/solr/collection1: class java.lang.Boolean cannot be cast to class java.lang.String (java.lang.Boolean and java.lang.String are in module java.base of loader 'bootstrap')
         [junit4]   2> 	at org.apache.solr.client.solrj.impl.HttpSolrClient.executeMethod(HttpSolrClient.java:622) ~[java/:?]
         [junit4]   2> 	at org.apache.solr.client.solrj.impl.HttpSolrClient.request(HttpSolrClient.java:255) ~[java/:?]
         [junit4]   2> 	at org.apache.solr.client.solrj.impl.HttpSolrClient.request(HttpSolrClient.java:244) ~[java/:?]
         [junit4]   2> 	at org.apache.solr.client.solrj.SolrRequest.process(SolrRequest.java:207) ~[java/:?]
         [junit4]   2> 	at org.apache.solr.client.solrj.SolrClient.query(SolrClient.java:987) ~[java/:?]
         [junit4]   2> 	at org.apache.solr.client.solrj.SolrClient.query(SolrClient.java:1002) ~[java/:?]
         [junit4]   2> 	at org.apache.solr.handler.TestReplicationHandlerDiskOverFlow.lambda$testDiskOverFlow$1(TestReplicationHandlerDiskOverFlow.java:154) ~[test/:?]
         [junit4]   2> 	at java.lang.Thread.run(Thread.java:834) [?:?]
         [junit4]   2> Caused by: java.lang.ClassCastException: class java.lang.Boolean cannot be cast to class java.lang.String (java.lang.Boolean and java.lang.String are in module java.base of loader 'bootstrap')
         [junit4]   2> 	at org.apache.solr.common.util.JavaBinCodec.readOrderedMap(JavaBinCodec.java:223) ~[java/:?]
         [junit4]   2> 	at org.apache.solr.common.util.JavaBinCodec.readObject(JavaBinCodec.java:298) ~[java/:?]
         [junit4]   2> 	at org.apache.solr.common.util.JavaBinCodec.readVal(JavaBinCodec.java:280) ~[java/:?]
         [junit4]   2> 	at org.apache.solr.common.util.JavaBinCodec.unmarshal(JavaBinCodec.java:193) ~[java/:?]
         [junit4]   2> 	at org.apache.solr.client.solrj.impl.BinaryResponseParser.processResponse(BinaryResponseParser.java:50) ~[java/:?]
         [junit4]   2> 	at org.apache.solr.client.solrj.impl.HttpSolrClient.executeMethod(HttpSolrClient.java:620) ~[java/:?]
         [junit4]   2> 	... 7 more
       
      

      It's possible that something about the test code (or code being tested) is doing something it should not be doing, and adding something "unexpeted" to the response object – but either the unmarshal code needs to be as forgiving as marshal code, or the marshal code should fail fast – not produce a binary stream that the unmarshal code can't parse.

      Attachments

        1. SOLR-13471.patch
          2 kB
          Chris M. Hostetter
        2. SOLR-13471.patch
          5 kB
          Chris M. Hostetter
        3. SOLR-13471.patch
          3 kB
          Chris M. Hostetter

        Issue Links

          Activity

            People

              Unassigned Unassigned
              hossman Chris M. Hostetter
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated: