Uploaded image for project: 'Apache Arrow'
  1. Apache Arrow
  2. ARROW-17991

[Python] pyarrow.dataset IPC format does not support compression

    XMLWordPrintableJSON

Details

    Description

      When trying to write an IPC dataset using pyarrow.dataset, it is not possible to pass a compression argument:

      Trying to pass a pyarrow.ipc.IpcWriteOptions object:

      >>> ds.write_dataset(f, "./thing.arrow", format=ds.IpcFileFormat(), file_options=ipc.IpcWriteOptions(compression='lz4'))
      Traceback (most recent call last):
        File "<stdin>", line 1, in <module>
        File "/home/joost/.cache/pypoetry/virtualenvs/datalogistik-rL_l_suP-py3.8/lib/python3.8/site-packages/pyarrow/dataset.py", line 940, in write_dataset
          if format != file_options.format:
      AttributeError: 'pyarrow.lib.IpcWriteOptions' object has no attribute 'format'

       

      Alternatively, pyarrow.dataset.IpcFileFormat().make_write_options() does not support a compression parameter

      Attachments

        Activity

          People

            joosthooz Joost Hoozemans
            joosthooz Joost Hoozemans
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Time Tracking

                Estimated:
                Original Estimate - Not Specified
                Not Specified
                Remaining:
                Remaining Estimate - 0h
                0h
                Logged:
                Time Spent - 4h
                4h