Uploaded image for project: 'Beam'
  1. Beam
  2. BEAM-10022

[Python] Error with `WriteToParquet` with empty buffer

Details

    • Bug
    • Status: Resolved
    • P2
    • Resolution: Fixed
    • 2.20.0
    • 2.22.0
    • io-py-parquet
    • None

    Description

      While using `WriteToParquet` I encounter this issue

      File "/usr/local/lib/python3.7/site-packages/apache_beam/io/iobase.py", line 1066, in finish_bundle
       self.writer.close(),
       File "/usr/local/lib/python3.7/site-packages/apache_beam/io/filebasedsink.py", line 423, in close
       self.sink.close(self.temp_handle)
       File "/usr/local/lib/python3.7/site-packages/apache_beam/io/parquetio.py", line 538, in close
       self._flush_buffer()
       File "/usr/local/lib/python3.7/site-packages/apache_beam/io/parquetio.py", line 570, in _flush_buffer
       size = size + b.size
      AttributeError: 'NoneType' object has no attribute 'size'
      
      

      This is because when instantiating an empty array `array=pa.array([])`, then `array.buffers()` returns `[None]`. However right now `_flush_buffer` always assume that buffers are not empty when incrementing the `size`.

      One simple fix would be simply to add `if b is not None:` before incrementing `size`

      Attachments

        Activity

          People

            Unassigned Unassigned
            lhoestq quentin lhoest
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: