Uploaded image for project: 'Apache NiFi'
  1. Apache NiFi
  2. NIFI-7790

XML record reader - failure on well-formed XML

    XMLWordPrintableJSON

    Details

    • Type: Bug
    • Status: Open
    • Priority: Major
    • Resolution: Unresolved
    • Affects Version/s: 1.11.4
    • Fix Version/s: None
    • Component/s: Extensions
    • Labels:

      Description

      I am using ConvertRecord in order to parse XML flowfiles to Avro, with the Infer Schema strategy. Some input flowfiles are sent to the failure output queue whereas they are well-formed: 

      <?xml version="1.0" encoding="UTF-8" standalone="yes"?>
      <root>
      	<authors>
      		<item>
      			<name>Neil Gaiman</name>
      		</item>
      	</authors>
      	<editors>
      		<item>
      			<commercialName>Hachette</commercialName>
      		</item>
      	</editors>
      </root>
      

      Note the use of authors/item/name on one side, and editors/item/commercialName on the other side.

      On the other hand, this gets correctly parsed: 

      <?xml version="1.0" encoding="UTF-8" standalone="yes"?>
      <root>
      	<authors>
      		<item>
      			<name>Neil Gaiman</name>
      		</item>
      	</authors>
      	<editors>
      		<item>
      			<name>Hachette</name>
      		</item>
      	</editors>
      </root>
      

      See the attached template for minimal reproducible example.

       

      My interpretation is that the failure in the first case is due to 2 independent XML node types having the same name (<item> in this case) but having different types and occurring in different parents with different types. In the second case, both <item>'s actually have the same node type. I didn't use any Schema Inference Cache, so both item types should be inferred independently. 

      Since the first document is legal XML (an XSD could be written for it) and can also be represented in Avro, its conversion shouldn't fail.

      I'll be happy to provide more details if needed.

        Attachments

        1. bug-parse-xml.xml
          34 kB
          Pierre Gramme

          Activity

            People

            • Assignee:
              Unassigned
              Reporter:
              Pierre Gramme Pierre Gramme
            • Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

              • Created:
                Updated: