Uploaded image for project: 'Apache Arrow'
  1. Apache Arrow
  2. ARROW-6573

[Python] Segfault when writing to parquet

    XMLWordPrintableJSON

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Minor
    • Resolution: Fixed
    • Affects Version/s: 0.14.1
    • Fix Version/s: 0.15.0
    • Component/s: C++, Python
    • Environment:
      Ubuntu 16.04. Pyarrow 0.14.1 installed through pip. Using Anaconda distribution of Python 3.7.

      Description

      When attempting to write out a pyarrow table to parquet I am observing a segfault when there is a mismatch between the schema and the datatypes. 

      Here is a reproducible example:

       

      import pyarrow as pa
      import pyarrow.parquet as pq
      
      data = dict()
      data["key"] = [0, 1, 2, 3] # segfault
      #data["key"] = ["0", "1", "2", "3"] # no segfault
      
      schema = pa.schema({"key" : pa.string()})
      
      table = pa.Table.from_pydict(data, schema = schema)
      print("now writing out test file")
      pq.write_table(table, "test.parquet"

      This results in a segfault when writing the table. Running 

       

      gdb -ex r --args python test.py 
      

      Yields

       

       

      Program received signal SIGSEGV, Segmentation fault. 0x00007fffe8173917 in virtual thunk to parquet::DictEncoderImpl<parquet::DataType<(parquet::Type::type)6> >::Put(parquet::ByteArray const*, int) () from /net/fantasia/home/jweinstk/anaconda3/lib/python3.7/site-packages/pyarrow/libparquet.so.14
      

       

       

      Thanks for all of your arrow work,

      Josh

        Attachments

          Issue Links

            Activity

              People

              • Assignee:
                wesm Wes McKinney
                Reporter:
                weinstockj Josh Weinstock
              • Votes:
                0 Vote for this issue
                Watchers:
                2 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved:

                  Time Tracking

                  Estimated:
                  Original Estimate - Not Specified
                  Not Specified
                  Remaining:
                  Remaining Estimate - 0h
                  0h
                  Logged:
                  Time Spent - 0.5h
                  0.5h