[DRILL-5514] Enhance VectorContainer to merge two row sets - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Improvement
Status: Resolved
Priority: Minor
Resolution: Fixed
Affects Version/s: 1.10.0
Fix Version/s: 1.11.0
Component/s: None
Labels:
- ready-to-commit

Description

Consider the concept of a "record batch" in Drill. On the one hand, one can envision a record batch as a stack of records:

| a1 | b1 | c1 |
----------------
| a2 | b2 | c2 |

But, Drill is columnar. So a record batch is really a "bundle" of vectors:

| a1 |    | b1 |    | c1 |
| a2 |    | b2 |    | c2 |

There are times when it is handy to build up a record batch as a merge of two different vector bundles:

-- bundle 1 --    -- bundle 2 --
| a1 |    | b1 |        | c1 |
| a2 |    | b2 |        | c2 |

For example, consider a reader. The reader implementation might read columns (a, b) from a file, say. Then, the "ScanBatch" might add (c) as an implicit vector (the file name, say.) The merged set of vectors comprises the final schema: (a, b, c).

This ticket asks for the code to do the merge:

Merge two schemas A = (a, b), B = (c) to create schema C = (a, b, c).
Merge two vector containers C1 and C2 to create a new container, C3, that holds the merger of the vectors from the first two.

Clearly, the merge only makes sense if:

The two input containers have the same row count, and
The columns in each input container are distinct.

Because this feature is also useful for tests, add the merge to the "row set" tools also.

Attachments

Issue Links

is part of

DRILL-5211 Queries fail due to direct memory fragmentation

Open

links to

GitHub Pull Request #837

Sub-Tasks

Review changes for DRILL 5514

Resolved

Karthikeyan Manivannan

Activity

People

Assignee:: Paul Rogers

Reporter:: Paul Rogers

Reviewer:: Karthikeyan Manivannan

Votes:: 0 Vote for this issue

Watchers:: 3 Start watching this issue

Dates

Created:: 15/May/17 22:12

Updated:: 19/Jun/17 10:51

Resolved:: 19/Jun/17 10:51

Time Tracking

Estimated:

Remaining:

Logged:

Not Specified

Include sub-tasks