Uploaded image for project: 'Apache Arrow'
  1. Apache Arrow
  2. ARROW-10574

[Python][Parquet] Allow collections for 'in' / 'not in' filter (in addition to sets)

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • None
    • 3.0.0
    • Python

    Description

      I would like to enhance partition filters in methods such as:

      pyarrow.parquet.ParquetDataset(path, filters)

      I am proposing the below enhancements:

      1. for operator "in", "not in", the value should be any typing.Iteratable (also a container). But currently only set is supported while other iteratable, such as list, tuple cannot function correctly. I would like to change it to accept any iteratable.
      2. Enhance the documents about the partition filters.

      I see there is a new version implemented with 
      _ParquetDatasetV2 which already accepts an iterable. So the documentation update is fine for the new version as well.
       

      Attachments

        Issue Links

          Activity

            People

              wyzhao Weiyang Zhao
              wyzhao Weiyang Zhao
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Time Tracking

                  Estimated:
                  Original Estimate - Not Specified
                  Not Specified
                  Remaining:
                  Remaining Estimate - 0h
                  0h
                  Logged:
                  Time Spent - 3h 10m
                  3h 10m