Uploaded image for project: 'Apache Arrow'
  1. Apache Arrow
  2. ARROW-2020

[Python] Parquet segfaults if coercing ns timestamps and writing 96-bit timestamps

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Closed
    • Major
    • Resolution: Duplicate
    • 0.8.0
    • 0.10.0
    • Python
    • OS: Mac OS X 10.13.2
      Python: 3.6.4
      PyArrow: 0.8.0

    Description

      If you try to write a PyArrow table containing nanosecond-resolution timestamps to Parquet using `coerce_timestamps` and `use_deprecated_int96_timestamps=True`, the Arrow library will segfault.

      The crash doesn't happen if you don't coerce the timestamp resolution or if you don't use 96-bit timestamps.

       

       

      To Reproduce:

       

       
      import datetime
      
      import pyarrow
      from pyarrow import parquet
      
      schema = pyarrow.schema([
          pyarrow.field('last_updated', pyarrow.timestamp('ns')),
      ])
      
      data = [
          pyarrow.array([datetime.datetime.now()], pyarrow.timestamp('ns')),
      ]
      
      table = pyarrow.Table.from_arrays(data, ['last_updated'])
      
      with open('test_file.parquet', 'wb') as fdesc:
          parquet.write_table(table, fdesc,
                              coerce_timestamps='us',  # 'ms' works too
                              use_deprecated_int96_timestamps=True)

       

      See attached file for the crash report.

       

      Attachments

        1. crash-report.txt
          53 kB
          Diego Argueta

        Issue Links

          Activity

            People

              joshuastorck Joshua Storck
              yiannisliodakis Diego Argueta
              Votes:
              0 Vote for this issue
              Watchers:
              6 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: