Uploaded image for project: 'Solr'
  1. Solr
  2. SOLR-11136

XMLResponseParser.readDocument makes dangerous assumptions / fails when indent=true and [child] doc transformer

Attach filesAttach ScreenshotVotersWatch issueWatchersCreate sub-taskLinkCloneUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

    Details

    • Type: Bug
    • Status: Closed
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 7.0, 7.1, 8.0
    • Component/s: None
    • Labels:
      None

      Description

      Some buggy code in XMLResponseParser.readDocument causes it to indirectly assume that once it encounters a nested START_ELEMENT 'doc' (which it can recursively parse) the only other XML stream events it will find will either be an END_ELEMENT, or more 'doc' START_ELEMENTs...

      protected SolrDocument readDocument( XMLStreamReader parser ) throws XMLStreamException
      {
        if( XMLStreamConstants.START_ELEMENT != parser.getEventType() ) {
          throw new RuntimeException( "must be start element, not: "+parser.getEventType() );
        }
      
        // ...
      
        while( true ) 
        {
          switch (parser.next()) {
          case XMLStreamConstants.START_ELEMENT:
            depth++;
            builder.setLength( 0 ); // reset the text
            type = KnownType.get( parser.getLocalName() );
      
            // ...
            
            // NOTE: nothing in this loop modifies 'type' 
            // so the 'while' is totally inappropriate even if there was no bug
            while( type == KnownType.DOC) {
              doc.addChildDocument(readDocument(parser));
              int event = parser.next();                                // PROBLEMATIC
              if (event == XMLStreamConstants.END_ELEMENT) { //Doc ends
                return doc;
              }
            }
            
            // ...
      

      Because of how the server side XML Writer code works, it's currently true that child documents should always come "after" any other fields or transformers – but depending on that is sketchy. Where this code actually causes real problems is if the server/client uses indent=true because then the parser.next(); call (labeled PROBLEMATIC) can return XMLStreamConstants.CHARACTER (or XMLStreamConstants.WHITESPACE) because the blank space inbetween sibling child docs, or after the last child doc, causing the recursive call to readDocument(parser) to fail (because it expects to find the reader positioned at a START_ELEMENT)

        Attachments

        Issue Links

          Activity

            People

            • Assignee:
              hossman Chris M. Hostetter
              Reporter:
              hossman Chris M. Hostetter

              Dates

              • Created:
                Updated:
                Resolved:

                Issue deployment