Uploaded image for project: 'Apache Arrow'
  1. Apache Arrow
  2. ARROW-7939

[Python] crashes when reading parquet file compressed with snappy

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • 0.16.0
    • 1.0.0
    • Python
    • None
    • Windows 7
      python 3.6.9
      pyarrow 0.16 from conda-forge

    Description

      When I installed pyarrow 0.16, some parquet files created with pyarrow 0.15.1 would make python crash. I drilled down to the simplest example I could find.

      It happens that some parquet files created with pyarrow 0.16 cannot either be read back. The example below works fine with arrays_ok but python crashes with arrays_nok (and as soon as they are at least three different values apparently).

      Besides, it works fine with 'none', 'gzip' and 'brotli' compression. The problem seems to happen only with snappy.

      import pyarrow.parquet as pq
      import pyarrow as pa
      arrays_ok = [[0,1]]
      arrays_ok = [[0,1,1]]
      arrays_nok = [[0,1,2]]
      table = pa.Table.from_arrays(arrays_nok,names=['a'])
      pq.write_table(table,'foo.parquet',compression='snappy')
      pq.read_table('foo.parquet')
      

      Attachments

        Issue Links

          Activity

            People

              uwe Uwe Korn
              marcbernot Marc Bernot
              Votes:
              0 Vote for this issue
              Watchers:
              8 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: