Uploaded image for project: 'Apache NiFi'
  1. Apache NiFi
  2. NIFI-5960

Wrong sub-schema picked for CHOICE datatype

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Open
    • Major
    • Resolution: Unresolved
    • 1.8.0
    • None
    • Core Framework
    • None

    Description

      When CHOICE datatype contains multiple RECORD choices, if any of the RECORD schemas have all nullable fields, these schemas will be considered compatible with any input, and can be picked as a valid choice instead of a more appropriate sub-schema. The following unit test showcases the issue:

      package org.apache.nifi.serialization.record;
      
      import org.apache.nifi.serialization.SimpleRecordSchema;
      import org.apache.nifi.serialization.record.type.ChoiceDataType;
      import org.apache.nifi.serialization.record.type.RecordDataType;
      import org.apache.nifi.serialization.record.util.DataTypeUtils;
      import org.junit.Test;
      
      import static java.util.Arrays.asList;
      import static java.util.Collections.*;
      import static org.apache.nifi.serialization.record.RecordFieldType.STRING;
      import static org.junit.Assert.*;
      
      public class DataTypeUtilsTest {
      	@Test
      	public void testChoiceCompatibility() {
      		DataType choice1 = new RecordDataType(new SimpleRecordSchema(singletonList(new RecordField("field1", STRING.getDataType(), false))));
      		DataType choice2 = new RecordDataType(new SimpleRecordSchema(singletonList(new RecordField("field2", STRING.getDataType()))));
      		Record record = new MapRecord(new SimpleRecordSchema(emptyList()), singletonMap("field1", "value1"));
      		DataType dataType = DataTypeUtils.chooseDataType(record, new ChoiceDataType(asList(choice2, choice1)));
      		assertEquals(dataType, choice1);
      		assertNotEquals(dataType, choice2);
      	}
      }
      

      When presented with an input containing a single field "field1", and a choice of two schemas , one with all nullable unrelated fields ("field2"), and another with fields matching the input ("field1"), the chooseDataType() call will choose the unrelated schema if it's presented first.

      Attachments

        Issue Links

          Activity

            People

              Unassigned Unassigned
              alex_savitsky Alex Savitsky
              Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

                Created:
                Updated:

                Time Tracking

                  Estimated:
                  Original Estimate - Not Specified
                  Not Specified
                  Remaining:
                  Remaining Estimate - 0h
                  0h
                  Logged:
                  Time Spent - 1.5h
                  1.5h