Details
-
Bug
-
Status: Open
-
Major
-
Resolution: Unresolved
-
9.0.0
-
None
-
None
Description
We have AWS Lambdas that were updated to run with PyArrow 9. We were calling the pyarrow.parquet.read_table function with the columns parameter to specify which columns to read. With PyArrow version prior to 9 this worked fine with just 500MB of memory for the Lambda. With PyArrow 9 we started to get these exceptions:
AWS Error [code 99]: curlCode: 28, Timeout was reached
We did some experimenting and found two options to get around the issue.
Option 1: Increase the memory to 1.7 MB so the Lamba gets a full CPU for the container
Option 2: Continue to allocate just 500 MB of memory, but stop using the columns parameter.
Looking for some insight as to what the issue might be. Thanks.