Uploaded image for project: 'Solr'
  1. Solr
  2. SOLR-11136

XMLResponseParser.readDocument makes dangerous assumptions / fails when indent=true and [child] doc transformer


    • Type: Bug
    • Status: Closed
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 7.0, 7.1, 8.0
    • Component/s: None
    • Security Level: Public (Default Security Level. Issues are Public)
    • Labels:


      Some buggy code in XMLResponseParser.readDocument causes it to indirectly assume that once it encounters a nested START_ELEMENT 'doc' (which it can recursively parse) the only other XML stream events it will find will either be an END_ELEMENT, or more 'doc' START_ELEMENTs...

      protected SolrDocument readDocument( XMLStreamReader parser ) throws XMLStreamException
        if( XMLStreamConstants.START_ELEMENT != parser.getEventType() ) {
          throw new RuntimeException( "must be start element, not: "+parser.getEventType() );
        // ...
        while( true ) 
          switch (parser.next()) {
          case XMLStreamConstants.START_ELEMENT:
            builder.setLength( 0 ); // reset the text
            type = KnownType.get( parser.getLocalName() );
            // ...
            // NOTE: nothing in this loop modifies 'type' 
            // so the 'while' is totally inappropriate even if there was no bug
            while( type == KnownType.DOC) {
              int event = parser.next();                                // PROBLEMATIC
              if (event == XMLStreamConstants.END_ELEMENT) { //Doc ends
                return doc;
            // ...

      Because of how the server side XML Writer code works, it's currently true that child documents should always come "after" any other fields or transformers – but depending on that is sketchy. Where this code actually causes real problems is if the server/client uses indent=true because then the parser.next(); call (labeled PROBLEMATIC) can return XMLStreamConstants.CHARACTER (or XMLStreamConstants.WHITESPACE) because the blank space inbetween sibling child docs, or after the last child doc, causing the recursive call to readDocument(parser) to fail (because it expects to find the reader positioned at a START_ELEMENT)


          Issue Links



              • Assignee:
                hossman Hoss Man
                hossman Hoss Man
              • Votes:
                0 Vote for this issue
                2 Start watching this issue


                • Created: