Details
-
Bug
-
Status: Resolved
-
Major
-
Resolution: Fixed
-
9.0.0
Description
When writing a large parquet file (e.g. 5GB) using pyarrow.dataset, it throws an exception:
Traceback (most recent call last):
File "pyarrow/_dataset_parquet.pyx", line 165, in pyarrow._dataset_parquet.ParquetFileFormat._finish_write
File "pyarrow/dataset.pyx", line 2695, in pyarrow._dataset.WrittenFile.init_
OverflowError: value too large to convert to int
Exception ignored in: 'pyarrow._dataset._filesystemdataset_write_visitor'
The file is written succesfully though. It seems related to this issue https://issues.apache.org/jira/browse/ARROW-16761.
I would guess the problem is the python field is an int while the C++ code returns an int64_t https://github.com/apache/arrow/pull/13338/files#diff-4f2eb12337651b45bab2b03abe2552dd7fc9958b1fbbeb09a2a488804b097109R164
Attachments
Issue Links
- links to