Uploaded image for project: 'Apache Arrow'
  1. Apache Arrow
  2. ARROW-7965

[Python] Refine higher level dataset API

    XMLWordPrintableJSON

Details

    Description

      Provide a more intuitive way to construct nested dataset:

      ```python

      1. instead of using confusing factory function
        dataset([
        factory("s3://old-taxi-data", format="parquet"),
        factory("local/path/to/new/data", format="csv")
        ])
      1. let the user to construct a new dataset directly from dataset objects
        dataset([
        dataset("s3://old-taxi-data", format="parquet"),
        dataset("local/path/to/new/data", format="csv")
        ])
        ```

      In the future we might want to introduce a new Dataset class which wraps functionality of both the dataset actory and the materialized dataset enabling optimizations over rediscovery of already materialized datasets.

      Attachments

        Issue Links

          Activity

            People

              kszucs Krisztian Szucs
              kszucs Krisztian Szucs
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Time Tracking

                  Estimated:
                  Original Estimate - Not Specified
                  Not Specified
                  Remaining:
                  Remaining Estimate - 0h
                  0h
                  Logged:
                  Time Spent - 16h 40m
                  16h 40m