Details
-
Improvement
-
Status: Resolved
-
Minor
-
Resolution: Fixed
-
1.10.0
-
None
Description
Consider the concept of a "record batch" in Drill. On the one hand, one can envision a record batch as a stack of records:
| a1 | b1 | c1 | ---------------- | a2 | b2 | c2 |
But, Drill is columnar. So a record batch is really a "bundle" of vectors:
| a1 | | b1 | | c1 | | a2 | | b2 | | c2 |
There are times when it is handy to build up a record batch as a merge of two different vector bundles:
-- bundle 1 -- -- bundle 2 -- | a1 | | b1 | | c1 | | a2 | | b2 | | c2 |
For example, consider a reader. The reader implementation might read columns (a, b) from a file, say. Then, the "ScanBatch" might add (c) as an implicit vector (the file name, say.) The merged set of vectors comprises the final schema: (a, b, c).
This ticket asks for the code to do the merge:
- Merge two schemas A = (a, b), B = (c) to create schema C = (a, b, c).
- Merge two vector containers C1 and C2 to create a new container, C3, that holds the merger of the vectors from the first two.
Clearly, the merge only makes sense if:
- The two input containers have the same row count, and
- The columns in each input container are distinct.
Because this feature is also useful for tests, add the merge to the "row set" tools also.
Attachments
Issue Links
- is part of
-
DRILL-5211 Queries fail due to direct memory fragmentation
- Open
- links to
GitHub user paul-rogers opened a pull request:
https://github.com/apache/drill/pull/837
DRILL-5514: Enhance VectorContainer to merge two row setsAdds ability to merge two schemas and to merge two vector containers,
in each case producing a new, merged result. See
DRILL-5514for details.Also provides a handy constructor to create a vector container given a
pre-defined schema.
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/paul-rogers/drill
DRILL-5514Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/drill/pull/837.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #837
commit 5b2ceccd7d002b56b93abbff769bfb96b9ff0ff6
Author: Paul Rogers <progers@maprtech.com>
Date: 2017-05-15T22:59:35Z
DRILL-5514: Enhance VectorContainer to merge two row setsAdds ability to merge two schemas and to merge two vector containers,
in each case producing a new, merged result. See
DRILL-5514for details.Also provides a handy constructor to create a vector container given a
pre-defined schema.