Uploaded image for project: 'Apache Drill'
  1. Apache Drill
  2. DRILL-5514

Enhance VectorContainer to merge two row sets

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Resolved
    • Minor
    • Resolution: Fixed
    • 1.10.0
    • 1.11.0
    • None

    Description

      Consider the concept of a "record batch" in Drill. On the one hand, one can envision a record batch as a stack of records:

      | a1 | b1 | c1 |
      ----------------
      | a2 | b2 | c2 |
      

      But, Drill is columnar. So a record batch is really a "bundle" of vectors:

      | a1 |    | b1 |    | c1 |
      | a2 |    | b2 |    | c2 |
      

      There are times when it is handy to build up a record batch as a merge of two different vector bundles:

      -- bundle 1 --    -- bundle 2 --
      | a1 |    | b1 |        | c1 |
      | a2 |    | b2 |        | c2 |
      

      For example, consider a reader. The reader implementation might read columns (a, b) from a file, say. Then, the "ScanBatch" might add (c) as an implicit vector (the file name, say.) The merged set of vectors comprises the final schema: (a, b, c).

      This ticket asks for the code to do the merge:

      • Merge two schemas A = (a, b), B = (c) to create schema C = (a, b, c).
      • Merge two vector containers C1 and C2 to create a new container, C3, that holds the merger of the vectors from the first two.

      Clearly, the merge only makes sense if:

      • The two input containers have the same row count, and
      • The columns in each input container are distinct.

      Because this feature is also useful for tests, add the merge to the "row set" tools also.

      Attachments

        Issue Links

          Activity

            People

              paul-rogers Paul Rogers
              paul-rogers Paul Rogers
              Karthikeyan Manivannan Karthikeyan Manivannan
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Time Tracking

                  Estimated:
                  Original Estimate - 2h
                  2h
                  Remaining:
                  Remaining Estimate - 2h
                  2h
                  Logged:
                  Time Spent - Not Specified
                  Not Specified