Uploaded image for project: 'Apache Arrow'
  1. Apache Arrow
  2. ARROW-9229

[Python] Pyarrow.Parquet.read_table Silently Crashes Python

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Closed
    • Major
    • Resolution: Fixed
    • 0.17.0
    • None
    • Python
    • None
    • Windows 10 Pro 1909

    Description

      A simple use of reading a Parquet file using PyArrow crashes Python silently with no explanation. Sudden and strange. I've narrowed it down to reproduce it as follows:

      conda create -n pa36 python=3.6 pyarrow=0.17 -c conda-forge -y
      

      Then, in python:

      import pyarrow.parquet
      tbl = pyarrow.parquet.read_table("some_file.snappy.parquet")
      

      Result - It crashes. Maybe because of the Snappy encoding? FWIW, the files were generated by Spark on Azure Databricks, not pandas.
       

      Here's the conda environment:

      (base) > conda env export
       name: pa36
       channels: 
       - conda-forge
       - defaults
       dependencies: 
       - abseil-cpp=20200225.2=h33f27b4_0
       - arrow-cpp=0.17.1=py36h1234567_8_cpu
       - aws-sdk-cpp=1.7.164=vc14h867dc94_1
       - boost-cpp=1.72.0=h2ba7cf6_1
       - brotli=1.0.7=h33f27b4_1002
       - bzip2=1.0.8=hfa6e2cd_2
       - c-ares=1.15.0=h2fa13f4_1001
       - ca-certificates=2020.6.20=hecda079_0
       - certifi=2020.6.20=py36h9f0ad1d_0
       - curl=7.71.0=h4b64cdc_0
       - gflags=2.2.2=he025d50_1002
       - glog=0.4.0=h0174b99_3
       - grpc-cpp=1.30.0=hfae5148_0
       - intel-openmp=2020.0=166
       - krb5=1.17.1=hc04afaa_1
       - libblas=3.8.0=15_mkl
       - libcblas=3.8.0=15_mkl
       - libcurl=7.71.0=h4b64cdc_0
       - liblapack=3.8.0=15_mkl
       - libprotobuf=3.12.3=h7bd577a_0
       - libssh2=1.9.0=h3235a2c_2
       - lz4-c=1.9.2=h62dcd97_1
       - mkl=2020.0=166
       - numpy=1.18.5=py36h4d86e3b_0
       - openssl=1.1.1g=he774522_0
       - pandas=1.0.5=py36hcc50265_0
       - parquet-cpp=1.5.1=2
       - pip=20.1.1=py_1
       - pyarrow=0.17.1=py36h1234567_8_cpu
       - python=3.6.10=he025d50_1009_cpython
       - python-dateutil=2.8.1=py_0
       - python_abi=3.6=1_cp36m
       - pytz=2020.1=pyh9f0ad1d_0
       - re2=2020.06.01=h33f27b4_0
       - setuptools=47.3.1=py36h9f0ad1d_0
       - six=1.15.0=pyh9f0ad1d_0
       - snappy=1.1.8=ha925a31_2
       - thrift-cpp=0.13.0=h1907cbf_2
       - tk=8.6.10=hfa6e2cd_0
       - vc=14.1=h869be7e_1
       - vs2015_runtime=14.16.27012=h30e32a0_2
       - wheel=0.34.2=py_1
       - wincertstore=0.2=py36_1003
       - xz=5.2.5=h2fa13f4_0
       - zlib=1.2.11=h2fa13f4_1006
       - zstd=1.4.4=h9f78265_3
      

      Attachments

        Issue Links

          Activity

            People

              Unassigned Unassigned
              ydimarsky@gmail.com Josh Dimarsky
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: