Uploaded image for project: 'Apache NiFi'
  1. Apache NiFi
  2. NIFI-7790

XML record reader - failure on well-formed XML

Attach filesAttach ScreenshotAdd voteVotersWatch issueWatchersCreate sub-taskLinkCloneUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

Details

    • Bug
    • Status: Open
    • Major
    • Resolution: Unresolved
    • 1.11.4
    • None
    • Extensions

    Description

      I am using ConvertRecord in order to parse XML flowfiles to Avro, with the Infer Schema strategy. Some input flowfiles are sent to the failure output queue whereas they are well-formed: 

      <?xml version="1.0" encoding="UTF-8" standalone="yes"?>
      <root>
      	<authors>
      		<item>
      			<name>Neil Gaiman</name>
      		</item>
      	</authors>
      	<editors>
      		<item>
      			<commercialName>Hachette</commercialName>
      		</item>
      	</editors>
      </root>
      

      Note the use of authors/item/name on one side, and editors/item/commercialName on the other side.

      On the other hand, this gets correctly parsed: 

      <?xml version="1.0" encoding="UTF-8" standalone="yes"?>
      <root>
      	<authors>
      		<item>
      			<name>Neil Gaiman</name>
      		</item>
      	</authors>
      	<editors>
      		<item>
      			<name>Hachette</name>
      		</item>
      	</editors>
      </root>
      

      See the attached template for minimal reproducible example.

       

      My interpretation is that the failure in the first case is due to 2 independent XML node types having the same name (<item> in this case) but having different types and occurring in different parents with different types. In the second case, both <item>'s actually have the same node type. I didn't use any Schema Inference Cache, so both item types should be inferred independently. 

      Since the first document is legal XML (an XSD could be written for it) and can also be represented in Avro, its conversion shouldn't fail.

      I'll be happy to provide more details if needed.

      Attachments

        Activity

          This comment will be Viewable by All Users Viewable by All Users
          Cancel

          People

            Unassigned Unassigned
            Pierre Gramme Pierre Gramme

            Dates

              Created:
              Updated:

              Slack

                Issue deployment