[SOLR-11136] XMLResponseParser.readDocument makes dangerous assumptions / fails when indent=true and [child] doc transformer - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Closed
Priority: Major
Resolution: Fixed
Affects Version/s: None
Fix Version/s: 7.0, 7.1, 8.0
Component/s: None
Labels:
None

Description

Some buggy code in XMLResponseParser.readDocument causes it to indirectly assume that once it encounters a nested START_ELEMENT 'doc' (which it can recursively parse) the only other XML stream events it will find will either be an END_ELEMENT, or more 'doc' START_ELEMENTs...

protected SolrDocument readDocument( XMLStreamReader parser ) throws XMLStreamException
{
  if( XMLStreamConstants.START_ELEMENT != parser.getEventType() ) {
    throw new RuntimeException( "must be start element, not: "+parser.getEventType() );
  }

  // ...

  while( true ) 
  {
    switch (parser.next()) {
    case XMLStreamConstants.START_ELEMENT:
      depth++;
      builder.setLength( 0 ); // reset the text
      type = KnownType.get( parser.getLocalName() );

      // ...
      
      // NOTE: nothing in this loop modifies 'type' 
      // so the 'while' is totally inappropriate even if there was no bug
      while( type == KnownType.DOC) {
        doc.addChildDocument(readDocument(parser));
        int event = parser.next();                                // PROBLEMATIC
        if (event == XMLStreamConstants.END_ELEMENT) { //Doc ends
          return doc;
        }
      }
      
      // ...

Because of how the server side XML Writer code works, it's currently true that child documents should always come "after" any other fields or transformers – but depending on that is sketchy. Where this code actually causes real problems is if the server/client uses indent=true because then the parser.next(); call (labeled PROBLEMATIC) can return XMLStreamConstants.CHARACTER (or XMLStreamConstants.WHITESPACE) because the blank space inbetween sibling child docs, or after the last child doc, causing the recursive call to readDocument(parser) to fail (because it expects to find the reader positioned at a START_ELEMENT)

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending

SOLR-11136.patch
21/Jul/17 23:56
8 kB
Chris M. Hostetter

Issue Links

blocks

SOLR-10494 Switch Solr's Default Response Type from XML to JSON

Closed

Activity

People

Assignee:: Chris M. Hostetter

Reporter:: Chris M. Hostetter

Votes:: 0 Vote for this issue

Watchers:: 2 Start watching this issue

Dates

Created:: 21/Jul/17 23:56

Updated:: 08/Jun/19 15:20

Resolved:: 22/Jul/17 00:33