Uploaded image for project: 'Apache Arrow'
  1. Apache Arrow
  2. ARROW-5572

[Python] raise error message when passing invalid filter in parquet reading

    Details

    • Type: Bug
    • Status: Open
    • Priority: Minor
    • Resolution: Unresolved
    • Affects Version/s: 0.13.0
    • Fix Version/s: None
    • Component/s: Python
    • Labels:

      Description

      From https://stackoverflow.com/questions/56522977/using-predicates-to-filter-rows-from-pyarrow-parquet-parquetdataset

      For example, when specifying a column in the filter which is a normal column and not a key in your partitioned folder hierarchy, the filter gets silently ignored. It would be nice to get an error message for this.
      Reproducible example:

      df = pd.DataFrame({'a': [0, 0, 1, 1], 'b': [0, 1, 0, 1], 'c': [1, 2, 3, 4]})
      table = pa.Table.from_pandas(df)
      pq.write_to_dataset(table, 'test_parquet_row_filters', partition_cols=['a'])
      # filter on 'a' (partition column) -> works
      pq.read_table('test_parquet_row_filters', filters=[('a', '=', 1)]).to_pandas()
      # filter on normal column (in future could do row group filtering) -> silently does nothing
      pq.read_table('test_parquet_row_filters', filters=[('b', '=', 1)]).to_pandas()
      

        Attachments

          Activity

            People

            • Assignee:
              Unassigned
              Reporter:
              jorisvandenbossche Joris Van den Bossche
            • Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

              • Created:
                Updated: