Details
-
Bug
-
Status: Resolved
-
Major
-
Resolution: Fixed
-
1.16.0
-
None
-
None
Description
See DRILL-7324. The following are problems found because some operators fail to set the record count for their containers.
Scan
TestComplexTypeReader, on cluster setup, using the PojoRecordReader:
ERROR o.a.d.e.p.i.validate.BatchValidator - Found one or more vector errors from ScanBatch
ScanBatch: Container record count not set
Reason: ScanBatch never sets the record count of its container (this is a generic issue, not specific to the PojoRecordReader).
Filter
TestComplexTypeReader.testNonExistentFieldConverting():
ERROR o.a.d.e.p.i.validate.BatchValidator - Found one or more vector errors from FilterRecordBatch FilterRecordBatch: Container record count not set
Hash Join
TestComplexTypeReader.test_array():
ERROR o.a.d.e.p.i.validate.BatchValidator - Found one or more vector errors from HashJoinBatch HashJoinBatch: Container record count not set
Occurs on the first batch in which the hash join returns OK_NEW_SCHEMA with no records.
Project
TestCsvWithHeaders.testEmptyFile()}} (when the text reader returned empty, schema-only batches):
ERROR o.a.d.e.p.i.validate.BatchValidator - Found one or more vector errors from ProjectRecordBatch ProjectRecordBatch: Container record count not set
Occurs in ProjectRecordBatch.handleNullInput(): it sets up the schema but does not set the value count to 0.
Unordered Receiver
TestCsvWithSchema.testMultiFileSchema():
ERROR o.a.d.e.p.i.validate.BatchValidator - Found one or more vector errors from UnorderedReceiverBatch UnorderedReceiverBatch: Container record count not set
The problem is that RecordBatchLoader.load() does not set the container record count.
Streaming Aggregate
TestJsonReader.testSumWithTypeCase():
ERROR o.a.d.e.p.i.validate.BatchValidator - Found one or more vector errors from StreamingAggBatch StreamingAggBatch: Container record count not set
The problem is that StreamingAggBatch.buildSchema() does not set the container record count to 0.
Limit
TestJsonReader.testDrill_1419():
ERROR o.a.d.e.p.i.validate.BatchValidator - Found one or more vector errors from LimitRecordBatch LimitRecordBatch: Container record count not set
None of the paths in LimitRecordBatch.innerNext() set the container record count.
Union All
TestJsonReader.testKvgenWithUnionAll():
ERROR o.a.d.e.p.i.validate.BatchValidator - Found one or more vector errors from UnionAllRecordBatch UnionAllRecordBatch: Container record count not set
When UnionAllRecordBatch calls VectorAccessibleUtilities.setValueCount(), it did not also set the container count.
Hash Aggregate
TestJsonReader.drill_4479():
ERROR o.a.d.e.p.i.validate.BatchValidator - Found one or more vector errors from HashAggBatch HashAggBatch: Container record count not set
Problem is that HashAggBatch.buildSchema() does not set the container record count to 0 for the first, empty, batch sent for OK_NEW_SCHEMA.
And Many More
I turns out that most operators fail to set one of the many row count variables somewhere in their code path: maybe in the schema setup path, maybe when building a batch along one of the many paths that operators follow. Further, we have multiple row counts that must be set:
- Values in each vector (setValueCount(),
- Row count in the container (setRecordCount()), which must be the same as the vector value count.
- Row count in the operator (batch), which is the (possibly filtered) count of records presented to downstream operators. It must be less than or equal to the container row count (except for an SV4.)
- The SV2 record count, which is the number of entries in the SV2 and must be the same as the batch row count (and less or equal to the container row count.)
- The SV2 actual bactch record count, which must be the same as the container row count.
- The SV4 record count, which must be the same as the batch record count. With an SV4, the batch consists of multiple containers, each of which must have an accurate container record count.
Attachments
Issue Links
- contains
-
DRILL-7324 Many vector-validity errors from unit tests
- Resolved
- incorporates
-
DRILL-7424 Project operator fails to set the container row count
- Resolved
- is part of
-
DRILL-7324 Many vector-validity errors from unit tests
- Resolved