Affects Version/s: 1.16.0
Fix Version/s: 1.19.0
DRILL-7324. The following are problems found because some operators fail to set the record count for their containers.
TestComplexTypeReader, on cluster setup, using the PojoRecordReader:
ERROR o.a.d.e.p.i.validate.BatchValidator - Found one or more vector errors from ScanBatch
ScanBatch: Container record count not set
Reason: ScanBatch never sets the record count of its container (this is a generic issue, not specific to the PojoRecordReader).
Occurs on the first batch in which the hash join returns OK_NEW_SCHEMA with no records.
TestCsvWithHeaders.testEmptyFile()}} (when the text reader returned empty, schema-only batches):
Occurs in ProjectRecordBatch.handleNullInput(): it sets up the schema but does not set the value count to 0.
The problem is that RecordBatchLoader.load() does not set the container record count.
The problem is that StreamingAggBatch.buildSchema() does not set the container record count to 0.
None of the paths in LimitRecordBatch.innerNext() set the container record count.
When UnionAllRecordBatch calls VectorAccessibleUtilities.setValueCount(), it did not also set the container count.
Problem is that HashAggBatch.buildSchema() does not set the container record count to 0 for the first, empty, batch sent for OK_NEW_SCHEMA.
I turns out that most operators fail to set one of the many row count variables somewhere in their code path: maybe in the schema setup path, maybe when building a batch along one of the many paths that operators follow. Further, we have multiple row counts that must be set:
- Values in each vector (setValueCount(),
- Row count in the container (setRecordCount()), which must be the same as the vector value count.
- Row count in the operator (batch), which is the (possibly filtered) count of records presented to downstream operators. It must be less than or equal to the container row count (except for an SV4.)
- The SV2 record count, which is the number of entries in the SV2 and must be the same as the batch row count (and less or equal to the container row count.)
- The SV2 actual bactch record count, which must be the same as the container row count.
- The SV4 record count, which must be the same as the batch record count. With an SV4, the batch consists of multiple containers, each of which must have an accurate container record count.