Uploaded image for project: 'Apache Arrow'
  1. Apache Arrow
  2. ARROW-8290

[Python][Dataset] Improve ergonomy of the FileSystemDataset constructor

    XMLWordPrintableJSON

Details

    Description

      Currently, to manually create a FileSystemDataset, you can do something like:

      dataset = ds.FileSystemDataset(
              schema, None, ds.ParquetFileFormat(), pa.fs.LocalFileSystem(),
              ["data_file1.parquet", "data_file2.parquet"],
              [ds.field('file') == 1, ds.field('file') == 2])
      

      There are some usibility improvements we can do though:

      • Allow passing the arguments by name to improve readability of the calling code (now they all need to be passed positionally, due to the way they are implemented in cython as not None)
      • I would maybe change the order of the arguments (eg start with the paths, we don't need to match the order of the C++ constructor)
      • Potentially allow partitions to be optional, in which case they need to be set to a list of ScalarExpression(True) values.

      Attachments

        Issue Links

          Activity

            People

              jorisvandenbossche Joris Van den Bossche
              jorisvandenbossche Joris Van den Bossche
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Time Tracking

                  Estimated:
                  Original Estimate - Not Specified
                  Not Specified
                  Remaining:
                  Remaining Estimate - 0h
                  0h
                  Logged:
                  Time Spent - 2h 40m
                  2h 40m