Details
-
Improvement
-
Status: Closed
-
Major
-
Resolution: Duplicate
-
None
Description
Currently, in `GetObjectRange` of f3fs the `GetObjectRequest` has no `ResponseStreamFactory` assigned. This means that the bytes returned by the S3 API are first sent to a `std::basic_stringbuf`. To my understanding this has two performance impacts:
- `std::basic_stringbuf` uses a growing array to buffer the response, so lots of allocations here
- on top of that, you have a copy operation from the `std::basic_stringbuf` when data is read into the Arrow buffer.
This seems to be a bit costly.
With `ResponseStreamFactory`, we might manage to get the data directly into the Arrow buffer.
I can take a try at it, but I would need some advice. Is there an existing utility to stream data into an Arrow buffer (if it exists, it is well hidden!) ? or should I stream the data into a plain array and then transfer ownership to Arrow ?
Attachments
Issue Links
- is superceded by
-
ARROW-8692 [C++] Avoid memory copies when downloading from S3
- Resolved