Details
-
Bug
-
Status: Closed
-
Major
-
Resolution: Fixed
-
5.0.0
Description
We are using RecordBatchFileWriter to write Arrow type directly to S3 using the S3FileSystem, then using RecordBatchFileReader to read from S3. The write is pretty efficient, write a 50MB finishes within 0.2s. But reading that file is taking 30s, which is definitely too long. Then I did several tests:
- I tried to use S3FileSystem to read the file into bytes, it's only taking 1s. which somehow makes me believe it's an issue with RecordBatchFileReader
- Half the size (around 25MB), with RecordBatchFileReader took 17s, without RecordBatchFileReader took 0.28s
- Double the size (around 100MB), with RecordBatchFileReader took 61s, without RecordBatchFileReader took 2.3s
- I tried to get all bytes using S3FileSystem first, then create a reader from the bytes. Then read all context from the reader, it's only taking 0.1s.
Attachments
Issue Links
- duplicates
-
ARROW-14577 [C++] Enable fine grained IO for async IPC reader
- Resolved
- links to