[NIFI-7790] XML record reader - failure on well-formed XML - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Open
Priority: Major
Resolution: Unresolved
Affects Version/s: 1.11.4
Fix Version/s: None
Component/s: Extensions
Labels:
- records
- xml

Description

I am using ConvertRecord in order to parse XML flowfiles to Avro, with the Infer Schema strategy. Some input flowfiles are sent to the failure output queue whereas they are well-formed:

<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<root>
	<authors>
		<item>
			<name>Neil Gaiman</name>
		</item>
	</authors>
	<editors>
		<item>
			<commercialName>Hachette</commercialName>
		</item>
	</editors>
</root>

Note the use of authors/item/name on one side, and editors/item/commercialName on the other side.

On the other hand, this gets correctly parsed:

<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<root>
	<authors>
		<item>
			<name>Neil Gaiman</name>
		</item>
	</authors>
	<editors>
		<item>
			<name>Hachette</name>
		</item>
	</editors>
</root>

See the attached template for minimal reproducible example.

My interpretation is that the failure in the first case is due to 2 independent XML node types having the same name (<item> in this case) but having different types and occurring in different parents with different types. In the second case, both <item>'s actually have the same node type. I didn't use any Schema Inference Cache, so both item types should be inferred independently.

Since the first document is legal XML (an XSD could be written for it) and can also be represented in Avro, its conversion shouldn't fail.

I'll be happy to provide more details if needed.

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending

bug-parse-xml.xml
04/Sep/20 16:51
34 kB
Pierre Gramme

Activity

People

Assignee:: Unassigned

Reporter:: Pierre Gramme

Votes:: 0 Vote for this issue

Watchers:: 2 Start watching this issue

Dates

Created:: 04/Sep/20 17:08

Updated:: 05/Nov/20 11:04