Uploaded image for project: 'Apache Arrow'
  1. Apache Arrow
  2. ARROW-3933

[Python] Segfault reading Parquet files from GNOMAD

    XMLWordPrintableJSON

Details

    Description

      I am getting segfault trying to run a basic program Ubuntu 18.04 VM (AWS). Error also occurs out of box on Mac OS X.

      $ sudo snap install --classic google-cloud-sdk
      $ gsutil cp gs://gnomad-public/release/2.0.2/vds/exomes/gnomad.exomes.r2.0.2.sites.vds/rdd.parquet/part-r-00000-31fcf9bd-682f-4c20-bbe5-b0bd08699104.snappy.parquet .
      $ conda install pyarrow
      $ python test.py
      Segmentation fault (core dumped)

      test.py:

      import pyarrow.parquet as pq
      path = "part-r-00000-31fcf9bd-682f-4c20-bbe5-b0bd08699104.snappy.parquet"
      pq.read_table(path)

      gdb output:

      Thread 3 "python" received signal SIGSEGV, Segmentation fault.
      [Switching to Thread 0x7fffdf199700 (LWP 13703)]
      0x00007fffdfc2a470 in parquet::arrow::StructImpl::GetDefLevels(short const*, unsigned long) () from /home/ubuntu/miniconda2/lib/python2.7/site-packages/pyarrow/../../../libparquet.so.11

      I tested fastparquet, it reads the file just fine.

      Attachments

        Issue Links

          Activity

            People

              wesm Wes McKinney
              dekapache David Konerding
              Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Time Tracking

                  Estimated:
                  Original Estimate - Not Specified
                  Not Specified
                  Remaining:
                  Remaining Estimate - 0h
                  0h
                  Logged:
                  Time Spent - 0.5h
                  0.5h