Uploaded image for project: 'Apache Drill'
  1. Apache Drill
  2. DRILL-5514

Enhance VectorContainer to merge two row sets

    Details

    • Type: Improvement
    • Status: Resolved
    • Priority: Minor
    • Resolution: Fixed
    • Affects Version/s: 1.10.0
    • Fix Version/s: 1.11.0
    • Component/s: None
    • Labels:

      Description

      Consider the concept of a "record batch" in Drill. On the one hand, one can envision a record batch as a stack of records:

      | a1 | b1 | c1 |
      ----------------
      | a2 | b2 | c2 |
      

      But, Drill is columnar. So a record batch is really a "bundle" of vectors:

      | a1 |    | b1 |    | c1 |
      | a2 |    | b2 |    | c2 |
      

      There are times when it is handy to build up a record batch as a merge of two different vector bundles:

      -- bundle 1 --    -- bundle 2 --
      | a1 |    | b1 |        | c1 |
      | a2 |    | b2 |        | c2 |
      

      For example, consider a reader. The reader implementation might read columns (a, b) from a file, say. Then, the "ScanBatch" might add (c) as an implicit vector (the file name, say.) The merged set of vectors comprises the final schema: (a, b, c).

      This ticket asks for the code to do the merge:

      • Merge two schemas A = (a, b), B = (c) to create schema C = (a, b, c).
      • Merge two vector containers C1 and C2 to create a new container, C3, that holds the merger of the vectors from the first two.

      Clearly, the merge only makes sense if:

      • The two input containers have the same row count, and
      • The columns in each input container are distinct.

      Because this feature is also useful for tests, add the merge to the "row set" tools also.

        Attachments

          Issue Links

            Activity

              People

              • Assignee:
                paul-rogers Paul Rogers
                Reporter:
                paul-rogers Paul Rogers
                Reviewer:
                Karthikeyan Manivannan
              • Votes:
                0 Vote for this issue
                Watchers:
                3 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved:

                  Time Tracking

                  Estimated:
                  Original Estimate - 2h
                  2h
                  Remaining:
                  Remaining Estimate - 2h
                  2h
                  Logged:
                  Time Spent - Not Specified
                  Not Specified