Uploaded image for project: 'Apache Drill'
  1. Apache Drill
  2. DRILL-7324

Many vector-validity errors from unit tests

    XMLWordPrintableJSON

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: 1.16.0
    • Fix Version/s: 1.17.0
    • Component/s: None
    • Labels:

      Description

      Drill's value vectors contain many counts that must be maintained in sync. Drill provides a utility, BatchValidator to check (a subset of) these values for consistency.

      The IteratorValidatorBatchIterator class is used in tests to validate the state of each operator (AKA "record batch") as Drill runs the Volcano iterator. This class can also validate vectors by setting the VALIDATE_VECTORS constant to `true`.

      This was done, then unit tests were run. Many tests failed. Examples:

      [INFO] Running org.apache.drill.TestUnionDistinct
      18:44:26.742 [22d42585-74c2-d418-6f59-9b1870d04770:frag:0:0] ERROR o.a.d.e.p.i.validate.BatchValidator - Found one or more vector errors from LimitRecordBatch
      key - NullableBitVector: Row count = 0, but value count = 2
      18:44:26.745 [22d42585-74c2-d418-6f59-9b1870d04770:frag:0:0] ERROR o.a.d.e.p.i.validate.BatchValidator - Found one or more vector errors from LimitRecordBatch
      key - NullableBitVector: Row count = 0, but value count = 2
      
      [INFO] Running org.apache.drill.TestUnionDistinct
      8:44:48.302 [22d4256e-c90b-847c-5104-02d6cdf5223e:frag:0:0] ERROR o.a.d.e.p.i.validate.BatchValidator - Found one or more vector errors from LimitRecordBatch
      key - NullableBitVector: Row count = 0, but value count = 2
      18:44:48.703 [22d4256e-ccf3-2af6-f56a-140e9c3e55bb:frag:0:0] ERROR o.a.d.e.p.i.validate.BatchValidator - Found one or more vector errors from FilterRecordBatch
      n_nationkey - IntVector: Row count = 2, but value count = 25
      n_regionkey - IntVector: Row count = 2, but value count = 25
      18:44:48.731 [22d4256e-ccf3-2af6-f56a-140e9c3e55bb:frag:0:0] ERROR o.a.d.e.p.i.validate.BatchValidator - Found one or more vector errors from FilterRecordBatch
      n_nationkey - IntVector: Row count = 4, but value count = 25
      n_regionkey - IntVector: Row count = 4, but value count = 25
      18:44:49.039 [22d4256f-6b39-d2ab-d145-4f2b0db315a3:frag:0:0] ERROR o.a.d.e.p.i.validate.BatchValidator - Found one or more vector errors from FilterRecordBatch
      n_nationkey - IntVector: Row count = 2, but value count = 25
      18:44:49.363 [22d4256e-3d91-850f-9ab4-5939219ac0d0:frag:0:0] ERROR o.a.d.e.p.i.validate.BatchValidator - Found one or more vector errors from FilterRecordBatch
      c_custkey - IntVector: Row count = 4, but value count = 1500
      18:44:49.597 [22d4256d-c113-ae5c-6f31-4dd1ec091365:frag:0:0] ERROR o.a.d.e.p.i.validate.BatchValidator - Found one or more vector errors from FilterRecordBatch
      n_nationkey - IntVector: Row count = 5, but value count = 25
      n_regionkey - IntVector: Row count = 5, but value count = 25
      18:44:49.610 [22d4256d-c113-ae5c-6f31-4dd1ec091365:frag:0:0] ERROR o.a.d.e.p.i.validate.BatchValidator - Found one or more vector errors from FilterRecordBatch
      r_regionkey - IntVector: Row count = 1, but value count = 5
      18:44:53.029 [22d4256a-8b70-5f3b-f79b-806e194c5ed2:frag:0:0] ERROR o.a.d.e.p.i.validate.BatchValidator - Found one or more vector errors from LimitRecordBatch
      n_nationkey - IntVector: Row count = 0, but value count = 25
      n_name - VarCharVector: Row count = 0, but value count = 25
      n_regionkey - IntVector: Row count = 0, but value count = 25
      18:44:53.033 [22d4256a-8b70-5f3b-f79b-806e194c5ed2:frag:0:0] ERROR o.a.d.e.p.i.validate.BatchValidator - Found one or more vector errors from LimitRecordBatch
      n_regionkey - IntVector: Row count = 5, but value count = 25
      18:44:53.331 [22d4256a-526c-7815-c216-8e45752a4a6c:frag:0:0] ERROR o.a.d.e.p.i.validate.BatchValidator - Found one or more vector errors from LimitRecordBatch
      n_nationkey - IntVector: Row count = 5, but value count = 25
      n_name - VarCharVector: Row count = 5, but value count = 25
      n_regionkey - IntVector: Row count = 5, but value count = 25
      18:44:53.337 [22d4256a-526c-7815-c216-8e45752a4a6c:frag:0:0] ERROR o.a.d.e.p.i.validate.BatchValidator - Found one or more vector errors from LimitRecordBatch
      n_regionkey - IntVector: Row count = 0, but value count = 25
      18:44:53.646 [22d42569-c293-ced0-c3d0-e9153cc4a70a:frag:0:0] ERROR o.a.d.e.p.i.validate.BatchValidator - Found one or more vector errors from LimitRecordBatch
      key - NullableBitVector: Row count = 0, but value count = 2
      
      Running org.apache.drill.TestTpchSingleMode
      18:45:01.299 [22d42563-0ed6-1501-86a1-4cb375a9cad4:frag:0:0] ERROR o.a.d.e.p.i.validate.BatchValidator - Found one or more vector errors from FilterRecordBatch
      
      Running org.apache.drill.TestMergeFilterPlan
      18:45:03.738 [22d4255f-b322-fd56-2f93-34b7f5c709c1:frag:0:0] ERROR o.a.d.e.p.i.validate.BatchValidator - Found one or more vector errors from FilterRecordBatch
      o_orderkey - IntVector: Row count = 561, but value count = 15000
      o_orderdate - DateVector: Row count = 561, but value count = 15000
      o_orderpriority - VarCharVector: Row count = 561, but value count = 15000
      18:45:03.828 [22d4255f-b322-fd56-2f93-34b7f5c709c1:frag:0:0] ERROR o.a.d.e.p.i.validate.BatchValidator - Found one or more vector errors from FilterRecordBatch
      l_orderkey - IntVector: Row count = 20580, but value count = 32767
      l_commitdate - DateVector: Row count = 20580, but value count = 32767
      l_receiptdate - DateVector: Row count = 20580, but value count = 32767
      18:45:03.990 [22d4255f-b322-fd56-2f93-34b7f5c709c1:frag:0:0] ERROR o.a.d.e.p.i.validate.BatchValidator - Found one or more vector errors from FilterRecordBatch
      l_orderkey - IntVector: Row count = 17317, but value count = 27408
      l_commitdate - DateVector: Row count = 17317, but value count = 27408
      l_receiptdate - DateVector: Row count = 17317, but value count = 27408
      [INFO] Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 1.041 s - in org.apache.drill.TestMergeFilterPlan
      18:45:04.929 [22d4255f-040c-f4c9-7d23-b90702db4a1e:frag:0:0] ERROR o.a.d.e.p.i.validate.BatchValidator - Found one or more vector errors from FilterRecordBatch
      o_orderkey - IntVector: Row count = 2287, but value count = 15000
      o_custkey - IntVector: Row count = 2287, but value count = 15000
      o_orderdate - DateVector: Row count = 2287, but value count = 15000
      18:45:04.944 [22d4255f-040c-f4c9-7d23-b90702db4a1e:frag:0:0] ERROR o.a.d.e.p.i.validate.BatchValidator - Found one or more vector errors from FilterRecordBatch
      r_regionkey - IntVector: Row count = 1, but value count = 5
      r_name - VarCharVector: Row count = 1, but value count = 5
      [INFO] Running org.apache.drill.TestSelectWithOption
      18:45:06.120 [22d4255e-5f13-aabb-40bb-bd09dc3d35e1:frag:0:0] ERROR o.a.d.e.p.i.validate.BatchValidator - Found one or more vector errors from FilterRecordBatch
      l_quantity - Float8Vector: Row count = 594, but value count = 32767
      l_extendedprice - Float8Vector: Row count = 594, but value count = 32767
      l_discount - Float8Vector: Row count = 594, but value count = 32767
      l_shipdate - DateVector: Row count = 594, but value count = 32767
      18:45:06.156 [22d4255e-5f13-aabb-40bb-bd09dc3d35e1:frag:0:0] ERROR o.a.d.e.p.i.validate.BatchValidator - Found one or more vector errors from FilterRecordBatch
      l_quantity - Float8Vector: Row count = 543, but value count = 27408
      l_extendedprice - Float8Vector: Row count = 543, but value count = 27408
      l_discount - Float8Vector: Row count = 543, but value count = 27408
      l_shipdate - DateVector: Row count = 543, but value count = 27408
      

      And many, many more. (Note that the test names might not be accurate: Maven runs multiple tests in parallel and it is hard to correlate log messages with tests in this output format.)

      The problem with these errors is that it makes operators very fragile: once we accept invalid vectors, it is very hard to detect when an operator makes vectors even more invalid. It is also hard to reason about the code if the inputs (or outputs) can be corrupt in normal operation.

      Suggestions:

      1. Extend BatchValidator with the vectors not yet covered (maps, repeated maps.)
      2. Work step-by-step through tests.
      3. Identify operators that corrupt vectors.
      4. Fix the source of corruption and retest.
      5. Continue until no vector corruption errors occur.
      6. Change the IteratorValidatorBatchIterator to check vectors by default, and to throw a fatal error if corruption is found.

        Attachments

          Issue Links

            Activity

              People

              • Assignee:
                Paul.Rogers Paul Rogers
                Reporter:
                Paul.Rogers Paul Rogers
                Reviewer:
                Arina Ielchiieva
              • Votes:
                0 Vote for this issue
                Watchers:
                3 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: