Uploaded image for project: 'IMPALA'
  1. IMPALA
  2. IMPALA-5169

Parallelise read I/O of BufferPool::Pin()

    Details

    • Type: Improvement
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: Impala 2.9.0
    • Fix Version/s: Impala 2.9.0
    • Component/s: Backend
    • Labels:
      None

      Description

      Currently read I/O in BufferPool is synchronous. In some cases this can lead to poor resource utilisation and I/O throughput, because:

      • We don't dispatch parallel reads to multiple scratch disks or high-throughput SSDs
      • Issuing reads of contiguous scratch ranges at the same time improves the odds that the second read can be served without a disk seek or by the disks internal cache.
      • Expose a batched Pin() interface that can pin multiple buffers at the same time
      • Expose an asynchronous Pin() interface that can start the read, and allow the client to wait for it.

      The first alternative is probably simplest.

        Issue Links

          Activity

          Hide
          tarmstrong Tim Armstrong added a comment -

          IMPALA-5169: Add support for async pins in buffer pool

          Makes Pin() do async reads behind-the-scenes, instead of
          blocking until the read completes. The blocking is done
          instead when the client tries to access the buffer via
          PageHandle::GetBuffer() or ExtractBuffer().

          This is implemented with a new sub-state of "pinned"
          where the page has a buffer and consumes reservation
          but the buffer does not contain valid data.

          Motivation:
          This unlocks various opportunities to overlap read I/Os
          with other work:

          • Reads to different disks can execute in parallel
          • I/O and computation can be overlapped.

          This initially benefits BufferedTupleStream::PinStream(),
          where many pages are pinned at once. With this change the
          reads run asynchronously. This can potentially lead
          to large speedups when spilling. E.g. if the pages for a Hash
          Join's partition are spread across 10 disks, we could get 10x
          the read throughput, plus overlap the I/O with hash table build.

          In future we can use this to do read-ahead over unpinned
          BufferedTupleStreams or for unpinned Runs in Sorter, but
          this requires changes to the client code to Pin() pages
          in advance.

          Testing:

          • BufferedTupleStreamV2 already exercises this.
          • Various BufferPool tests already exercise this.
          • Added a basic test to cover edge cases made possible by the
            new state transitions.
          • Extended the randomised test to cover this.

          Change-Id: Ibdf074c1ac4405d6f08d623ba438a85f7d39fd79
          Reviewed-on: http://gerrit.cloudera.org:8080/6612
          Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com>
          Tested-by: Impala Public Jenkins

          Show
          tarmstrong Tim Armstrong added a comment - IMPALA-5169 : Add support for async pins in buffer pool Makes Pin() do async reads behind-the-scenes, instead of blocking until the read completes. The blocking is done instead when the client tries to access the buffer via PageHandle::GetBuffer() or ExtractBuffer(). This is implemented with a new sub-state of "pinned" where the page has a buffer and consumes reservation but the buffer does not contain valid data. Motivation: This unlocks various opportunities to overlap read I/Os with other work: Reads to different disks can execute in parallel I/O and computation can be overlapped. This initially benefits BufferedTupleStream::PinStream(), where many pages are pinned at once. With this change the reads run asynchronously. This can potentially lead to large speedups when spilling. E.g. if the pages for a Hash Join's partition are spread across 10 disks, we could get 10x the read throughput, plus overlap the I/O with hash table build. In future we can use this to do read-ahead over unpinned BufferedTupleStreams or for unpinned Runs in Sorter, but this requires changes to the client code to Pin() pages in advance. Testing: BufferedTupleStreamV2 already exercises this. Various BufferPool tests already exercise this. Added a basic test to cover edge cases made possible by the new state transitions. Extended the randomised test to cover this. Change-Id: Ibdf074c1ac4405d6f08d623ba438a85f7d39fd79 Reviewed-on: http://gerrit.cloudera.org:8080/6612 Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com> Tested-by: Impala Public Jenkins

            People

            • Assignee:
              tarmstrong Tim Armstrong
              Reporter:
              tarmstrong Tim Armstrong
            • Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development