[ARROW-10482] [Python] Specifying compression type on a column basis when writing Parquet not working - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Resolved
Priority: Major
Resolution: Fixed
Affects Version/s: None
Fix Version/s: 3.0.0
Component/s: Python
Labels:
- pull-request-available

External issue URL:
https://github.com/apache/arrow/issues/26458

Description

From https://stackoverflow.com/questions/64666270/using-per-column-compression-codec-in-parquet-write-table

According to the docs, you can specify the compression type on a column-by-column basis, but that doesn't seem to be working:

In [5]: table = pa.table([[1, 2], [3, 4], [5, 6]], names=["foo", "bar", "baz"])

In [6]: pq.write_table(table, 'test1.parquet', compression=dict(foo='zstd',bar='snappy',baz='brotli'))
...
~/scipy/repos/arrow/python/pyarrow/_parquet.cpython-37m-x86_64-linux-gnu.so in string.from_py.__pyx_convert_string_from_py_std__in_string()

TypeError: expected bytes, str found

Attachments

Issue Links

links to

GitHub Pull Request #8580

Activity

People

Assignee:: Joris Van den Bossche

Reporter:: Joris Van den Bossche

Votes:: 0 Vote for this issue

Watchers:: 4 Start watching this issue

Dates

Created:: 03/Nov/20 16:30

Updated:: 11/Jan/23 08:13

Resolved:: 06/Nov/20 14:38

Time Tracking

Estimated:

Not Specified

Remaining:

Logged:

0.5h