Uploaded image for project: 'Apache Arrow'
  1. Apache Arrow
  2. ARROW-6573

[Python] Segfault when writing to parquet

Details

    • Bug
    • Status: Resolved
    • Minor
    • Resolution: Fixed
    • 0.14.1
    • 0.15.0
    • C++, Python
    • Ubuntu 16.04. Pyarrow 0.14.1 installed through pip. Using Anaconda distribution of Python 3.7.

    Description

      When attempting to write out a pyarrow table to parquet I am observing a segfault when there is a mismatch between the schema and the datatypes. 

      Here is a reproducible example:

       

      import pyarrow as pa
      import pyarrow.parquet as pq
      
      data = dict()
      data["key"] = [0, 1, 2, 3] # segfault
      #data["key"] = ["0", "1", "2", "3"] # no segfault
      
      schema = pa.schema({"key" : pa.string()})
      
      table = pa.Table.from_pydict(data, schema = schema)
      print("now writing out test file")
      pq.write_table(table, "test.parquet"

      This results in a segfault when writing the table. Running 

       

      gdb -ex r --args python test.py 
      

      Yields

       

       

      Program received signal SIGSEGV, Segmentation fault. 0x00007fffe8173917 in virtual thunk to parquet::DictEncoderImpl<parquet::DataType<(parquet::Type::type)6> >::Put(parquet::ByteArray const*, int) () from /net/fantasia/home/jweinstk/anaconda3/lib/python3.7/site-packages/pyarrow/libparquet.so.14
      

       

       

      Thanks for all of your arrow work,

      Josh

      Attachments

        Issue Links

          Activity

            People

              wesm Wes McKinney
              weinstockj Josh Weinstock
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Time Tracking

                  Estimated:
                  Original Estimate - Not Specified
                  Not Specified
                  Remaining:
                  Remaining Estimate - 0h
                  0h
                  Logged:
                  Time Spent - 0.5h
                  0.5h

                  Slack

                    Issue deployment