Uploaded image for project: 'Apache Arrow'
  1. Apache Arrow
  2. ARROW-11538

[Python] Segfault reading Parquet dataset with Timestamp filter

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Minor
    • Resolution: Fixed
    • 3.0.0
    • 4.0.0
    • Python
    • None
    • platform: Linux 64bit
      conda env:
      conda create -n pya python=3.8 pyarrow=3.0.0 pandas=1.2.1 pytest -c conda-forge

    Description

      The first two tests pass but the third gives: Fatal Python error: Segmentation fault

      All three pass in with pyarrow=2.0.0

      import pandas
      import pyarrow as pa
      import pyarrow.dataset as ds
      import pyarrow.parquet as pq
      import pytest
      
      
      @pytest.fixture
      def data_path(tmp_path):
          path = tmp_path / "data.parquet"
          df = pandas.DataFrame(
              [
                  ["A", pandas.Timestamp("2020-11-04")],
              ],
              columns=["name", "date"],
          )
          table = pa.Table.from_pandas(df)
          pq.write_table(table, path, version="2.0")
          return df, path
      
      
      @pytest.mark.parametrize(
          "filter",
          [
              None,
              ds.field("date") == "2020-11-04",
              ds.field("date") == pandas.Timestamp("2020-11-04"),
          ],
      )
      def test_dataset_filter(filter, data_path):
          data, path = data_path
      
          dataset = ds.dataset(path, format="parquet")
          assert data.equals(dataset.to_table(filter=filter).to_pandas())
      
      

      Attachments

        Issue Links

          Activity

            People

              jorisvandenbossche Joris Van den Bossche
              josham Josh
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: