Uploaded image for project: 'Apache Arrow'
  1. Apache Arrow
  2. ARROW-11379

[C++][Dataset] Reading dataset with filtering on timestamp partition field crashes

    XMLWordPrintableJSON

Details

    Description

      In [1]: df = pd.DataFrame({"dates": list(pd.date_range("2012-01-01", periods=2, freq="D")) * 5, "col": range(10)})
      
      In [2]: df.to_parquet("test_partition_timestamps", partition_cols=["dates"])
      
      In [3]: !ls test_partition_timestamps/
      'dates=2012-01-01 00:00:00'  'dates=2012-01-02 00:00:00'
      
      In [4]: import pyarrow.dataset as ds
      
      In [6]: part = ds.partitioning(pa.schema([("dates", pa.timestamp("s"))]), flavor="hive")
      
      In [7]: dataset = ds.dataset("test_partition_timestamps/", format="parquet", partitioning=part)
      

      Reading the dataset is fine and fives the correct types:

      In [10]: dataset.to_table()
      Out[10]: 
      pyarrow.Table
      col: int64
      dates: timestamp[s]
      

      but filtering on the timestamp column segfaults:

      In [11]: dataset.to_table(filter=ds.field("dates") > pd.Timestamp("2012-01-01"))
      ../src/arrow/compute/kernels/scalar_cast_temporal.cc:129:  Check failed: (batch[0].kind()) == (Datum::ARRAY) 
      /home/joris/miniconda3/envs/arrow-dev/lib/libarrow.so.300(+0xc2224a)[0x7f68d2ccf24a]
      /home/joris/miniconda3/envs/arrow-dev/lib/libarrow.so.300(+0xc221c8)[0x7f68d2ccf1c8]
      /home/joris/miniconda3/envs/arrow-dev/lib/libarrow.so.300(+0xc221ea)[0x7f68d2ccf1ea]
      /home/joris/miniconda3/envs/arrow-dev/lib/libarrow.so.300(_ZN5arrow4util8ArrowLogD1Ev+0x47)[0x7f68d2ccf549]
      /home/joris/miniconda3/envs/arrow-dev/lib/libarrow.so.300(+0xf0252a)[0x7f68d2faf52a]
      /home/joris/miniconda3/envs/arrow-dev/lib/libarrow.so.300(_ZNSt17_Function_handlerIFvPN5arrow7compute13KernelContextERKNS1_9ExecBatchEPNS0_5DatumEEPS9_E9_M_invokeERKSt9_Any_dataOS3_S6_OS8_+0x69)[0x7f68d2e8ab86]
      /home/joris/miniconda3/envs/arrow-dev/lib/libarrow.so.300(_ZNKSt8functionIFvPN5arrow7compute13KernelContextERKNS1_9ExecBatchEPNS0_5DatumEEEclES3_S6_S8_+0x7a)[0x7f68d2deec04]
      /home/joris/miniconda3/envs/arrow-dev/lib/libarrow.so.300(+0xd3d6f9)[0x7f68d2dea6f9]
      /home/joris/miniconda3/envs/arrow-dev/lib/libarrow.so.300(+0xd3cd5b)[0x7f68d2de9d5b]
      /home/joris/miniconda3/envs/arrow-dev/lib/libarrow.so.300(_ZNK5arrow7compute8Function7ExecuteERKSt6vectorINS_5DatumESaIS3_EEPKNS0_15FunctionOptionsEPNS0_11ExecContextE+0x8c7)[0x7f68d2df9963]
      /home/joris/miniconda3/envs/arrow-dev/lib/libarrow.so.300(+0xd2eed2)[0x7f68d2ddbed2]
      /home/joris/miniconda3/envs/arrow-dev/lib/libarrow.so.300(_ZNK5arrow7compute12MetaFunction7ExecuteERKSt6vectorINS_5DatumESaIS3_EEPKNS0_15FunctionOptionsEPNS0_11ExecContextE+0x15d)[0x7f68d2dfac8f]
      /home/joris/miniconda3/envs/arrow-dev/lib/libarrow.so.300(_ZN5arrow7compute12CallFunctionERKNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEERKSt6vectorINS_5DatumESaISA_EEPKNS0_15FunctionOptionsEPNS0_11ExecContextE+0x26c)[0x7f68d2dedc6f]
      /home/joris/miniconda3/envs/arrow-dev/lib/libarrow.so.300(_ZN5arrow7compute12CallFunctionERKNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEERKSt6vectorINS_5DatumESaISA_EEPKNS0_15FunctionOptionsEPNS0_11ExecContextE+0x93)[0x7f68d2deda96]
      /home/joris/miniconda3/envs/arrow-dev/lib/libarrow.so.300(_ZN5arrow7compute4CastERKNS_5DatumERKNS0_11CastOptionsEPNS0_11ExecContextE+0xf7)[0x7f68d2ddd493]
      /home/joris/miniconda3/envs/arrow-dev/lib/libarrow.so.300(_ZN5arrow7compute4CastERKNS_5DatumESt10shared_ptrINS_8DataTypeEERKNS0_11CastOptionsEPNS0_11ExecContextE+0x77)[0x7f68d2ddd6e2]
      /home/joris/miniconda3/envs/arrow-dev/lib/libarrow_dataset.so.300(+0x1c5c21)[0x7f68b30cfc21]
      /home/joris/miniconda3/envs/arrow-dev/lib/libarrow_dataset.so.300(+0x1c6789)[0x7f68b30d0789]
      /home/joris/miniconda3/envs/arrow-dev/lib/libarrow_dataset.so.300(+0x1c5097)[0x7f68b30cf097]
      /home/joris/miniconda3/envs/arrow-dev/lib/libarrow_dataset.so.300(_ZNK5arrow7dataset10Expression4BindENS_10ValueDescrEPNS_7compute11ExecContextE+0x732)[0x7f68b30d22e8]
      ...
      

      Attachments

        Issue Links

          Activity

            People

              bkietz Ben Kietzman
              jorisvandenbossche Joris Van den Bossche
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Time Tracking

                  Estimated:
                  Original Estimate - Not Specified
                  Not Specified
                  Remaining:
                  Remaining Estimate - 0h
                  0h
                  Logged:
                  Time Spent - 1h 20m
                  1h 20m