Consider the concept of a "record batch" in Drill. On the one hand, one can envision a record batch as a stack of records:
But, Drill is columnar. So a record batch is really a "bundle" of vectors:
There are times when it is handy to build up a record batch as a merge of two different vector bundles:
For example, consider a reader. The reader implementation might read columns (a, b) from a file, say. Then, the "ScanBatch" might add (c) as an implicit vector (the file name, say.) The merged set of vectors comprises the final schema: (a, b, c).
This ticket asks for the code to do the merge:
- Merge two schemas A = (a, b), B = (c) to create schema C = (a, b, c).
- Merge two vector containers C1 and C2 to create a new container, C3, that holds the merger of the vectors from the first two.
Clearly, the merge only makes sense if:
- The two input containers have the same row count, and
- The columns in each input container are distinct.
Because this feature is also useful for tests, add the merge to the "row set" tools also.
|Review changes for DRILL 5514||Resolved||