Uploaded image for project: 'Apache Arrow'
  1. Apache Arrow
  2. ARROW-11981

[C++][Dataset][Compute] Replace UnionDataset with Union ExecNode

    XMLWordPrintableJSON

Details

    Description

      UnionDataset allows Fragments of multiple schemas and differing file formats to be scanned together as a single Dataset. This is useful functionality but makes the Dataset interface somewhat difficult to reason about since it must be general enough to accommodate UnionDataset.

      After ARROW-11928 it will probably be more natural to support unioning of datasets through a subclass of ExecNode. Reconciliation of differing schemas can then be trivially handled by a full ProjectNode.

      Note this would obviate both ARROW-11001 and ARROW-11749. In addition, Dataset could be simplified to a concrete class containing a set of compatibly typed/formatted Fragments.

      Attachments

        Issue Links

          Activity

            People

              aocsa Alexander Ocsa
              bkietz Ben Kietzman
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Time Tracking

                  Estimated:
                  Original Estimate - Not Specified
                  Not Specified
                  Remaining:
                  Remaining Estimate - 0h
                  0h
                  Logged:
                  Time Spent - 5.5h
                  5.5h