Uploaded image for project: 'Solr'
  1. Solr
  2. SOLR-11136

XMLResponseParser.readDocument makes dangerous assumptions / fails when indent=true and [child] doc transformer

    Details

    • Type: Bug
    • Status: Closed
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 7.0, 7.1, master (8.0)
    • Component/s: None
    • Security Level: Public (Default Security Level. Issues are Public)
    • Labels:
      None

      Description

      Some buggy code in XMLResponseParser.readDocument causes it to indirectly assume that once it encounters a nested START_ELEMENT 'doc' (which it can recursively parse) the only other XML stream events it will find will either be an END_ELEMENT, or more 'doc' START_ELEMENTs...

      protected SolrDocument readDocument( XMLStreamReader parser ) throws XMLStreamException
      {
        if( XMLStreamConstants.START_ELEMENT != parser.getEventType() ) {
          throw new RuntimeException( "must be start element, not: "+parser.getEventType() );
        }
      
        // ...
      
        while( true ) 
        {
          switch (parser.next()) {
          case XMLStreamConstants.START_ELEMENT:
            depth++;
            builder.setLength( 0 ); // reset the text
            type = KnownType.get( parser.getLocalName() );
      
            // ...
            
            // NOTE: nothing in this loop modifies 'type' 
            // so the 'while' is totally inappropriate even if there was no bug
            while( type == KnownType.DOC) {
              doc.addChildDocument(readDocument(parser));
              int event = parser.next();                                // PROBLEMATIC
              if (event == XMLStreamConstants.END_ELEMENT) { //Doc ends
                return doc;
              }
            }
            
            // ...
      

      Because of how the server side XML Writer code works, it's currently true that child documents should always come "after" any other fields or transformers – but depending on that is sketchy. Where this code actually causes real problems is if the server/client uses indent=true because then the parser.next(); call (labeled PROBLEMATIC) can return XMLStreamConstants.CHARACTER (or XMLStreamConstants.WHITESPACE) because the blank space inbetween sibling child docs, or after the last child doc, causing the recursive call to readDocument(parser) to fail (because it expects to find the reader positioned at a START_ELEMENT)

        Attachments

          Issue Links

            Activity

              People

              • Assignee:
                hossman Hoss Man
                Reporter:
                hossman Hoss Man
              • Votes:
                0 Vote for this issue
                Watchers:
                2 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: