Uploaded image for project: 'Solr'
  1. Solr
  2. SOLR-11136

XMLResponseParser.readDocument makes dangerous assumptions / fails when indent=true and [child] doc transformer

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Closed
    • Major
    • Resolution: Fixed
    • None
    • 7.0, 7.1, 8.0
    • None
    • None

    Description

      Some buggy code in XMLResponseParser.readDocument causes it to indirectly assume that once it encounters a nested START_ELEMENT 'doc' (which it can recursively parse) the only other XML stream events it will find will either be an END_ELEMENT, or more 'doc' START_ELEMENTs...

      protected SolrDocument readDocument( XMLStreamReader parser ) throws XMLStreamException
      {
        if( XMLStreamConstants.START_ELEMENT != parser.getEventType() ) {
          throw new RuntimeException( "must be start element, not: "+parser.getEventType() );
        }
      
        // ...
      
        while( true ) 
        {
          switch (parser.next()) {
          case XMLStreamConstants.START_ELEMENT:
            depth++;
            builder.setLength( 0 ); // reset the text
            type = KnownType.get( parser.getLocalName() );
      
            // ...
            
            // NOTE: nothing in this loop modifies 'type' 
            // so the 'while' is totally inappropriate even if there was no bug
            while( type == KnownType.DOC) {
              doc.addChildDocument(readDocument(parser));
              int event = parser.next();                                // PROBLEMATIC
              if (event == XMLStreamConstants.END_ELEMENT) { //Doc ends
                return doc;
              }
            }
            
            // ...
      

      Because of how the server side XML Writer code works, it's currently true that child documents should always come "after" any other fields or transformers – but depending on that is sketchy. Where this code actually causes real problems is if the server/client uses indent=true because then the parser.next(); call (labeled PROBLEMATIC) can return XMLStreamConstants.CHARACTER (or XMLStreamConstants.WHITESPACE) because the blank space inbetween sibling child docs, or after the last child doc, causing the recursive call to readDocument(parser) to fail (because it expects to find the reader positioned at a START_ELEMENT)

      Attachments

        1. SOLR-11136.patch
          8 kB
          Chris M. Hostetter

        Issue Links

          Activity

            People

              hossman Chris M. Hostetter
              hossman Chris M. Hostetter
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: