Uploaded image for project: 'Apache Arrow'
  1. Apache Arrow
  2. ARROW-2450

[Python] Saving to parquet fails for empty lists

    XMLWordPrintableJSON

Details

    Description

      When writing a table to parquet through pandas, if any column includes an empty list, it fails with a segmentation fault.

      Minimal example:

      import pyarrow as pa
      import pyarrow.parquet as pq
      import pandas as pd
      
      def save(rows):
          table1 = pa.Table.from_pandas(pd.DataFrame(rows))
          pq.write_table(table1, 'test-foo.pq')
          table2 = pq.read_table('test-foo.pq')
      
          print('ROWS:', rows)
          print('TABLE1:', table1.to_pandas(), sep='\n')
          print('TABLE2:', table2.to_pandas(), sep='\n')
      
      save([{'val': ['something']}])
      print('---')
      save([{'val': []}])  # empty
      

      Output:

      ROWS: [{'val': ['something']}]
      TABLE1:
                 val
      0  [something]
      TABLE2:
                 val
      0  [something]
      ---
      ROWS: [{'val': []}]
      TABLE1:
        val
      0  []
      [1]    13472 segmentation fault (core dumped)  python3 test.py
      

      Versions:

      $ pip3 list | grep pyarrow
      pyarrow (0.9.0)
      $ python3 --version
      Python 3.5.2
      

      Attachments

        Issue Links

          Activity

            People

              apitrou Antoine Pitrou
              uwe Uwe Korn
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: