Uploaded image for project: 'Apache Arrow'
  1. Apache Arrow
  2. ARROW-3514

[Python] zlib deflate exception when writing Parquet file

    XMLWordPrintableJSON

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: 0.11.0
    • Fix Version/s: 0.11.1
    • Component/s: C++, Python
    • Environment:
      Amazon Linux, CentOS 7, Ubuntu 16.04, zlib 1.2.7/1.2.8, CPython 3.6.

      Description

      The below Python code throws an exception in 0.11.0, but not in 0.10.0.

      I was able to reproduce the issue in Amazon Linux, CentOS 7, and Ubuntu 16.04, but not in Windows 7.

      The Amazon and CentOS machines are both running zlib 1.2.7, and the Ubuntu machine is using 1.2.8.

      Tested with CPython 3.6 in all cases.

      import io
      import pyarrow
      from pyarrow import parquet
      
      tbl = pyarrow.Table.from_arrays([pyarrow.array(['abc', 'def'])], ['some_col'])
      
      f = io.BytesIO()
      parquet.write_table(tbl, f, compression='gzip')
      

      Following is the exception:

      Traceback (most recent call last):
        File "test_pyarrow.py", line 8, in <module>
          parquet.write_table(tbl, f, compression='gzip')
        File "/home/adam/anaconda3/lib/python3.6/site-packages/pyarrow/parquet.py", line 1125, in write_table
          writer.write_table(table, row_group_size=row_group_size)
        File "/home/adam/anaconda3/lib/python3.6/site-packages/pyarrow/parquet.py", line 376, in write_table
          self.writer.write_table(table, row_group_size=row_group_size)
        File "pyarrow/_parquet.pyx", line 934, in pyarrow._parquet.ParquetWriter.write_table
        File "pyarrow/error.pxi", line 83, in pyarrow.lib.check_status
      pyarrow.lib.ArrowIOError: Arrow error: IOError: zlib deflate failed, output buffer too small
      

        Attachments

          Issue Links

            Activity

              People

              • Assignee:
                apitrou Antoine Pitrou
                Reporter:
                amachanic Adam Machanic
              • Votes:
                0 Vote for this issue
                Watchers:
                5 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved:

                  Time Tracking

                  Estimated:
                  Original Estimate - Not Specified
                  Not Specified
                  Remaining:
                  Remaining Estimate - 0h
                  0h
                  Logged:
                  Time Spent - 2h 20m
                  2h 20m