Uploaded image for project: 'Apache NiFi'
  1. Apache NiFi
  2. NIFI-7697

NiFi XMLReader Record Component sometimes ignores empty XML Elements

    XMLWordPrintableJSON

Details

    Description

      I am currently developing a processor for Apache NiFi that is contingent upon being configured with an implementation of RecordReaderFactory that produces well-formed NiFi Records based on input data.

      The JsonTreeReader component produced accurate results for all of my test cases. However, I noticed that, at least with the default configuration, the XMLReader component sometimes seems to mishandle data; namely, empty XML elements that are sub-children of XML elements that are represented as Arrays in NiFi Records.

      This occurs when I test using the standard ConvertRecord NiFi Processor and set the Reader to XMLReader and the Writer to JsonRecordSetWriter.

      These first 2 test cases work as expected:

      Test Case 1:

      Input XML:

      <?xml version="1.0" encoding="UTF-8"?>
      <Root>
         <DataArr>SomeData</DataArr>
         <DataArr>
            <Field>
               <NonEmptyField>2</NonEmptyField>
            </Field>
         </DataArr>
      </Root>
      

      Output Json:

      [
         {
            "DataArr":[
               "SomeData",
               "MapRecord[{Field=MapRecord[{NonEmptyField=2}]}]"
            ]
         }
      ]
      

      Test Case 2:

      Input XML:

      <?xml version="1.0" encoding="UTF-8"?>
      <Root>
         <SomeData />
         <MoreData>2</MoreData>
      </Root>
      

      Output Json:

      [
         {
            "SomeData":null,
            "MoreData":2
         }
      ]
      

      However, the following does not work as expected:

      Test Case 3:

      Input XML:

      <Root>
         <DataArr>SomeData</DataArr>
         <DataArr>
            <Field>
               <EmptyField/>
            </Field>
         </DataArr>
      </Root>
      

      Output Json:

      [
         {
            "DataArr":[
               "SomeData"
            ]
         }
      ]
      

      It is critical for the functioning of my Processor that Field and EmptyField appear in this Json output for Test Case 3, and for all other inputs analogous to this case.

      I have tried to supply a custom NiFi RecordSchema to the components and verified it was being used, but I got the same results.

      Is there a way to configure these controllers such that this empty field is not ignored, or is this a bug in the XMLReader component?

      You can get these results from running this processor as described on NiFi, but you can also run this JUnit test with testXml swapped out with the particular test case:

      import org.apache.nifi.controller.ControllerService;
      import org.apache.nifi.json.JsonRecordSetWriter;
      import org.apache.nifi.processor.Relationship;
      import org.apache.nifi.processors.standard.ConvertRecord;
      import org.apache.nifi.reporting.InitializationException;
      import org.apache.nifi.util.MockFlowFile;
      import org.apache.nifi.util.TestRunner;
      import org.apache.nifi.util.TestRunners;
      import org.apache.nifi.xml.XMLReader;
      import org.junit.Test;
      
      public class TestNiFiMinimal {
          @Test
          public void testEmptyXMLGetsProcessed() throws InitializationException {
              ConvertRecord convertRecord = new ConvertRecord();
              TestRunner testRunner = TestRunners.newTestRunner(convertRecord);
              ControllerService xmlReader = new XMLReader();
              testRunner.addControllerService("xmlReader", xmlReader);
              testRunner.enableControllerService(xmlReader);
              testRunner.setProperty("record-reader", "xmlReader");
              ControllerService jsonWriter = new JsonRecordSetWriter();
              testRunner.addControllerService("jsonWriter", jsonWriter);
              testRunner.enableControllerService(jsonWriter);
              testRunner.setProperty("record-writer", "jsonWriter");
              String testXml = "<?xml version='1.0' encoding='UTF-8'?><Root><DataArr>SomeData</DataArr><DataArr><Field><EmptyField/></Field></DataArr></Root>";
              testRunner.enqueue(testXml);
              testRunner.run();
              Relationship success = convertRecord.getRelationships().stream().filter(relationship -> relationship.getName().equals("success")).findAny().get();
              testRunner.assertAllFlowFilesTransferred(success);
              final MockFlowFile original = testRunner.getFlowFilesForRelationship(success).get(0);
              original.assertContentEquals("");
          }
      }
      

      Attachments

        Activity

          People

            Unassigned Unassigned
            andrewjc2000 Andrew Chafos
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

              Created:
              Updated: