Details
-
Bug
-
Status: Resolved
-
Major
-
Resolution: Fixed
-
1.27.0, 2.0.0-M4
-
None
Description
Consider the following use case:
- GFF Processor, generating a JSON with 3 fields: a, b, and c
- ConvertRecord with JSON Reader / JSON Writer
- Both reader and writer are configured with a schema only specifying fields a and b
The expected result is a JSON that only contains fields a and b.
We're following the below path in the code:
- AbstractRecordProcessor (L131)
Record firstRecord = reader.nextRecord();
In this case, the default method for nextRecord() is defined in RecordReader (L50)
default Record nextRecord() throws IOException, MalformedRecordException { return nextRecord(true, false); }
where we are NOT dropping the unknown fields (Java doc needs some fixing here as it is saying the opposite)
We get to
writer.write(firstRecord);
which gets us to
- WriteJsonResult (L206)
Here, we do a check
isUseSerializeForm(record, writeSchema)
which currently returns true when it should not. Because of this we write the serialised form which ignores the writer schema.
In this method isUseSerializeForm(), we do check
record.getSchema().equals(writeSchema)
But at this point record.getSchema() returns the schema defined in the reader which is equal to the one defined in the writer - even though the record has additional fields compared to the defined schema.
The suggested fix is check is to also add a check on
record.isDropUnknownFields()
If dropUnknownFields is false, then we do not use the serialised form.
While this does solve the issue, I'm a bit conflicted on the current approach. Not only this could have a performance impact (we are likely going to not use the serialized form as often), but it also feels like the default should be to ignore the unknown fields when reading the record.
If we consider the below scenario:
- GFF Processor, generating a JSON with 3 fields: a, b, and c
- ConvertRecord with JSON Reader / JSON Writer
- JSON reader with a schema only specifying fields a and b
- JSON writer with a schema specifying fields a, b, and c (c defaulting to null)
It feels like the expected result should be a JSON with the field c and a null value, because the reader would drop the field when reading the JSON and converting it into a record and pass it to the writer.
If we agree on the above, then it may be easier to juste override nextRecord() in AbstractJsonRowRecordReader and default to nextRecord(true, true).
Attachments
Issue Links
- duplicates
-
NIFI-13362 JSONRecordSetWriter does not account for schema changes when writing serialized form
- Resolved
- relates to
-
NIFI-13963 Unknown fields not dropped by JSON Writer as expected by specified schema
- Resolved
- links to