IMPALA-5169: Add support for async pins in buffer pool
Makes Pin() do async reads behind-the-scenes, instead of
blocking until the read completes. The blocking is done
instead when the client tries to access the buffer via
PageHandle::GetBuffer() or ExtractBuffer().
This is implemented with a new sub-state of "pinned"
where the page has a buffer and consumes reservation
but the buffer does not contain valid data.
This unlocks various opportunities to overlap read I/Os
with other work:
- Reads to different disks can execute in parallel
- I/O and computation can be overlapped.
This initially benefits BufferedTupleStream::PinStream(),
where many pages are pinned at once. With this change the
reads run asynchronously. This can potentially lead
to large speedups when spilling. E.g. if the pages for a Hash
Join's partition are spread across 10 disks, we could get 10x
the read throughput, plus overlap the I/O with hash table build.
In future we can use this to do read-ahead over unpinned
BufferedTupleStreams or for unpinned Runs in Sorter, but
this requires changes to the client code to Pin() pages
- BufferedTupleStreamV2 already exercises this.
- Various BufferPool tests already exercise this.
- Added a basic test to cover edge cases made possible by the
new state transitions.
- Extended the randomised test to cover this.
Reviewed-by: Tim Armstrong <email@example.com>
Tested-by: Impala Public Jenkins