Uploaded image for project: 'Apache Arrow'
  1. Apache Arrow
  2. ARROW-6642

[Python] chained access of ParquetDataset's metadata segfaults

    XMLWordPrintableJSON

    Details

      Description

      Creating and reading a parquet dataset:

      table = pa.table({'a': [1, 2, 3]})
      
      import pyarrow.parquet as pq
      pq.write_table(table, '__test_statistics_segfault.parquet')
      dataset = pq.ParquetDataset('__test_statistics_segfault.parquet')
      dataset_piece = dataset.pieces[0]
      

      If you access the metadata and a column's statistics in steps, this works fine:

      meta = dataset_piece.get_metadata()
      row = meta.row_group(0)
      col = row.column(0)
      

      but doing it chained in one step, this segfaults:

      dataset_piece.get_metadata().row_group(0).column(0)
      

      dataset_piece.get_metadata().row_group(0) still works, but additionally with .column(0) then it segfaults.

        Attachments

          Issue Links

            Activity

              People

              • Assignee:
                jorisvandenbossche Joris Van den Bossche
                Reporter:
                jorisvandenbossche Joris Van den Bossche
              • Votes:
                0 Vote for this issue
                Watchers:
                2 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved:

                  Time Tracking

                  Estimated:
                  Original Estimate - Not Specified
                  Not Specified
                  Remaining:
                  Remaining Estimate - 0h
                  0h
                  Logged:
                  Time Spent - 20m
                  20m