Uploaded image for project: 'Apache Arrow'
  1. Apache Arrow
  2. ARROW-5436

[Python] expose filters argument in parquet.read_table

    XMLWordPrintableJSON

Details

    Description

      Currently, the parquet.read_table function can be used both for reading a single file (interface to ParquetFile) as a directory (interface to ParquetDataset).

      ParquetDataset has some extra keywords such as filters that would be nice to expose through read_table as well.

      Of course one can always use ParquetDataset if you need its power, but for pandas wrapping pyarrow it is easier to be able to pass through keywords just to parquet.read_table instead of calling either read_table or ParquetDataset. Context: https://github.com/pandas-dev/pandas/issues/26551

      Attachments

        Issue Links

          Activity

            People

              jorisvandenbossche Joris Van den Bossche
              jorisvandenbossche Joris Van den Bossche
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Time Tracking

                  Estimated:
                  Original Estimate - Not Specified
                  Not Specified
                  Remaining:
                  Remaining Estimate - 0h
                  0h
                  Logged:
                  Time Spent - 1h 20m
                  1h 20m