Details
-
Bug
-
Status: Open
-
Major
-
Resolution: Unresolved
-
1.11.4
-
None
Description
I am using ConvertRecord in order to parse XML flowfiles to Avro, with the Infer Schema strategy. Some input flowfiles are sent to the failure output queue whereas they are well-formed:
<?xml version="1.0" encoding="UTF-8" standalone="yes"?> <root> <authors> <item> <name>Neil Gaiman</name> </item> </authors> <editors> <item> <commercialName>Hachette</commercialName> </item> </editors> </root>
Note the use of authors/item/name on one side, and editors/item/commercialName on the other side.
On the other hand, this gets correctly parsed:
<?xml version="1.0" encoding="UTF-8" standalone="yes"?> <root> <authors> <item> <name>Neil Gaiman</name> </item> </authors> <editors> <item> <name>Hachette</name> </item> </editors> </root>
See the attached template for minimal reproducible example.
My interpretation is that the failure in the first case is due to 2 independent XML node types having the same name (<item> in this case) but having different types and occurring in different parents with different types. In the second case, both <item>'s actually have the same node type. I didn't use any Schema Inference Cache, so both item types should be inferred independently.
Since the first document is legal XML (an XSD could be written for it) and can also be represented in Avro, its conversion shouldn't fail.
I'll be happy to provide more details if needed.