Details
-
Bug
-
Status: Resolved
-
Major
-
Resolution: Fixed
-
None
-
None
-
None
Description
When UnionAllRecordBatch uses IterOutcome values returned from the next() method of upstream batches, it seems to be using those values wrongly (making incorrect inferences about what they mean).
In particular, some switch statements seem to check for NONE vs. OK_NEW_SCHEMA in order to determine whether there are any rows (instead of explicitly checking the number of rows). However, OK_NEW_SCHEMA can be returned even when there are zero rows.
The apparent latent bug in the union code blocks the fix for DRILL-2288 (having ScanBatch return OK_NEW_SCHEMA for a zero-rows case in which is was wrongly (per the IterOutcome protocol) returning NONE without first returning OK_NEW_SCHEMA).
For details of IterOutcome values, see the Javadoc documentation of RecordBatch.IterOutcome (after DRILL-3641 is merged; until then, see https://github.com/apache/drill/pull/113).
For an environment/code state that exposes the UnionAllRecordBatch problems, see https://github.com/dsbos/incubator-drill/tree/bugs/WORK_2288_etc, which includes:
- a test that exposes the
DRILL-2288problem; - an enhanced IteratorValidatorBatchIterator, which now detects IterOutcome value sequence violations; and
- a fixed (though not-yet-cleaned) version of ScanBatch that fixes the
DRILL-2288problem and thereby exposes the UnionAllRecordBatch problem (several test methods in each of TestUnionAll and TestUnionDistinct fail).
Attachments
Issue Links
- blocks
-
DRILL-3737 CTAS from empty text file fails with NPE
- Closed
- is part of
-
DRILL-2288 ScanBatch violates IterOutcome protocol for zero-row sources [was: missing JDBC metadata (schema) for 0-row results...]
- Closed
- relates to
-
DRILL-3805 Empty JSON on LHS UNION non empty JSON on RHS must return results
- Open
- requires
-
DRILL-3641 Document RecordBatch.IterOutcome (enumerators and possible sequences)
- Resolved