Uploaded image for project: 'Apache Drill'
  1. Apache Drill
  2. DRILL-7325

Many operators do not set container record count

    XMLWordPrintableJSON

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: 1.16.0
    • Fix Version/s: 1.19.0
    • Component/s: None
    • Labels:
      None

      Description

      See DRILL-7324. The following are problems found because some operators fail to set the record count for their containers.

      Scan

      TestComplexTypeReader, on cluster setup, using the PojoRecordReader:

      ERROR o.a.d.e.p.i.validate.BatchValidator - Found one or more vector errors from ScanBatch
      ScanBatch: Container record count not set

      Reason: ScanBatch never sets the record count of its container (this is a generic issue, not specific to the PojoRecordReader).

      Filter

      TestComplexTypeReader.testNonExistentFieldConverting():

      ERROR o.a.d.e.p.i.validate.BatchValidator - Found one or more vector errors from FilterRecordBatch
      FilterRecordBatch: Container record count not set
      

      Hash Join

      TestComplexTypeReader.test_array():

      ERROR o.a.d.e.p.i.validate.BatchValidator - Found one or more vector errors from HashJoinBatch
      HashJoinBatch: Container record count not set
      

      Occurs on the first batch in which the hash join returns OK_NEW_SCHEMA with no records.

      Project

      TestCsvWithHeaders.testEmptyFile()}} (when the text reader returned empty, schema-only batches):

      ERROR o.a.d.e.p.i.validate.BatchValidator - Found one or more vector errors from ProjectRecordBatch
      ProjectRecordBatch: Container record count not set
      

      Occurs in ProjectRecordBatch.handleNullInput(): it sets up the schema but does not set the value count to 0.

      Unordered Receiver

      TestCsvWithSchema.testMultiFileSchema():

      ERROR o.a.d.e.p.i.validate.BatchValidator - Found one or more vector errors from UnorderedReceiverBatch
      UnorderedReceiverBatch: Container record count not set
      

      The problem is that RecordBatchLoader.load() does not set the container record count.

      Streaming Aggregate

      TestJsonReader.testSumWithTypeCase():

      ERROR o.a.d.e.p.i.validate.BatchValidator - Found one or more vector errors from StreamingAggBatch
      StreamingAggBatch: Container record count not set
      

      The problem is that StreamingAggBatch.buildSchema() does not set the container record count to 0.

      Limit

      TestJsonReader.testDrill_1419():

      ERROR o.a.d.e.p.i.validate.BatchValidator - Found one or more vector errors from LimitRecordBatch
      LimitRecordBatch: Container record count not set
      

      None of the paths in LimitRecordBatch.innerNext() set the container record count.

      Union All

      TestJsonReader.testKvgenWithUnionAll():

      ERROR o.a.d.e.p.i.validate.BatchValidator - Found one or more vector errors from UnionAllRecordBatch
      UnionAllRecordBatch: Container record count not set
      

      When UnionAllRecordBatch calls VectorAccessibleUtilities.setValueCount(), it did not also set the container count.

      Hash Aggregate

      TestJsonReader.drill_4479():

      ERROR o.a.d.e.p.i.validate.BatchValidator - Found one or more vector errors from HashAggBatch
      HashAggBatch: Container record count not set
      

      Problem is that HashAggBatch.buildSchema() does not set the container record count to 0 for the first, empty, batch sent for OK_NEW_SCHEMA.

      And Many More

      I turns out that most operators fail to set one of the many row count variables somewhere in their code path: maybe in the schema setup path, maybe when building a batch along one of the many paths that operators follow. Further, we have multiple row counts that must be set:

      • Values in each vector (setValueCount(),
      • Row count in the container (setRecordCount()), which must be the same as the vector value count.
      • Row count in the operator (batch), which is the (possibly filtered) count of records presented to downstream operators. It must be less than or equal to the container row count (except for an SV4.)
      • The SV2 record count, which is the number of entries in the SV2 and must be the same as the batch row count (and less or equal to the container row count.)
      • The SV2 actual bactch record count, which must be the same as the container row count.
      • The SV4 record count, which must be the same as the batch record count. With an SV4, the batch consists of multiple containers, each of which must have an accurate container record count.

        Attachments

          Issue Links

            Activity

              People

              • Assignee:
                Paul.Rogers Paul Rogers
                Reporter:
                Paul.Rogers Paul Rogers
              • Votes:
                0 Vote for this issue
                Watchers:
                1 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: