[PARQUET-1273] [Python] Error writing to partitioned Parquet dataset - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Resolved
Priority: Major
Resolution: Fixed
Affects Version/s: None
Fix Version/s: cpp-1.5.0
Component/s: parquet-cpp
Labels:
- pull-request-available
Environment:

Linux (Ubuntu 16.04)

Description

I receive the following error after upgrading to pyarrow 0.8.0 when writing to a dataset:

ArrowIOError: Column 3 had 187374 while previous column had 10000

The command was:
write_table_values =

{'row_group_size': 10000}

pq.write_to_dataset(pa.Table.from_pandas(df, preserve_index=True), '/logs/parsed/test', partition_cols=['Product', 'year', 'month', 'day', 'hour'], **write_table_values)

I've also tried write_table_values =

{'chunk_size': 10000}

and received the same error.

This same command works in version 0.7.1. I am trying to troubleshoot the problem but wanted to submit a ticket.

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending

ARROW-1938.py
24/Jan/18 19:09
0.7 kB
Robert Dailey
ARROW-1938-test-data.csv.gz
24/Jan/18 19:09
2.28 MB
Robert Dailey
pyarrow_dataset_error.png
19/Dec/17 17:21
263 kB
Robert Dailey

Issue Links

links to

GitHub Pull Request #453

Activity

People

Assignee:: Joshua Storck

Reporter:: Robert Dailey

Votes:: 1 Vote for this issue

Watchers:: 5 Start watching this issue

Dates

Created:: 19/Dec/17 17:22

Updated:: 18/Apr/18 08:10

Resolved:: 18/Apr/18 08:10