Uploaded image for project: 'Apache Arrow'
  1. Apache Arrow
  2. ARROW-17583

[Python] File write visitor throws exception on large parquet file

    XMLWordPrintableJSON

Details

    Description

      When writing a large parquet file (e.g. 5GB) using pyarrow.dataset, it throws an exception:

      Traceback (most recent call last):
        File "pyarrow/_dataset_parquet.pyx", line 165, in pyarrow._dataset_parquet.ParquetFileFormat._finish_write
        File "pyarrow/dataset.pyx", line 2695, in pyarrow._dataset.WrittenFile.init_
      OverflowError: value too large to convert to int
      Exception ignored in: 'pyarrow._dataset._filesystemdataset_write_visitor'

      The file is written succesfully though. It seems related to this issue https://issues.apache.org/jira/browse/ARROW-16761.

      I would guess the problem is the python field is an int while the C++ code returns an int64_t https://github.com/apache/arrow/pull/13338/files#diff-4f2eb12337651b45bab2b03abe2552dd7fc9958b1fbbeb09a2a488804b097109R164 

      Attachments

        Issue Links

          Activity

            People

              joosthooz Joost Hoozemans
              joosthooz Joost Hoozemans
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Time Tracking

                  Estimated:
                  Original Estimate - Not Specified
                  Not Specified
                  Remaining:
                  Remaining Estimate - 0h
                  0h
                  Logged:
                  Time Spent - 1h 50m
                  1h 50m