Uploaded image for project: 'Apache Arrow'
  1. Apache Arrow
  2. ARROW-12522

[C++] Implement asynchronous/"lazy" variants of ReadRangeCache

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • None
    • 5.0.0
    • C++

    Description

      Currently ReadRangeCache performs both readahead and coalescing. Also, it exposes primarily a blocking API. Two improvements would be useful for implementing async-generator versions of file readers:

      • A method to get a Future<> for a set of read ranges, so that you can asynchronously wait for ranges to be read instead of attempting to read and getting blocked
      • A way to make the cache not perform readahead, so that data is fetched only when requested. (Then, consumers could handle readahead by making multiple requests to the cache.)

      The cache would still act as an actual cache and would still coalesce. (A further improvement might be to allow discarding cache entries. For the purpose of getting AsyncGenerator<RecordBatch>, we don't need a range more than once, so the cache is just wasting memory.)

      This makes it straightforward to adapt synchronous readers into asynchronous ones so long as you know the read ranges up front; you can then cache all the ranges, call WaitFor<>, then hand the buffer to the existing synchronous reader.

      Attachments

        Issue Links

          Activity

            People

              lidavidm David Li
              lidavidm David Li
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Time Tracking

                  Estimated:
                  Original Estimate - Not Specified
                  Not Specified
                  Remaining:
                  Remaining Estimate - 0h
                  0h
                  Logged:
                  Time Spent - 2h 10m
                  2h 10m