Uploaded image for project: 'Apache Arrow'
  1. Apache Arrow
  2. ARROW-8164

[C++][Dataset] Let datasets be viewable with non-identical schema

    XMLWordPrintableJSON

Details

    Description

      It would be useful to allow some schema unification capability after discovery has completed. For example, if a FileSystemDataset is being wrapped into a UnionDataset with another and their schemas are unifiable then there is no reason we can't create the UnionDataset (rather than emitting an error because the schemas are not identical).

      I think this behavior will be most naturally expressed in C++ like so:

      virtual Result<Dataset> Dataset::ReplaceSchema(std::shared_ptr<Schema> schema) const = 0;
      

      which will raise an error if the provided schema is not unifiable with the current dataset schema.

      If this needs to be extended to non trivial projections then this will probably warrant a separate class, ProjectedDataset or so. Definitely follow up material (if desired)

      Attachments

        Issue Links

          Activity

            People

              bkietz Ben Kietzman
              bkietz Ben Kietzman
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Time Tracking

                  Estimated:
                  Original Estimate - Not Specified
                  Not Specified
                  Remaining:
                  Remaining Estimate - 0h
                  0h
                  Logged:
                  Time Spent - 3h 20m
                  3h 20m