Details
-
Bug
-
Status: Resolved
-
Major
-
Resolution: Fixed
-
None
Description
Note, in the interest of keeping things simple, this ideally replaces the AsyncTaskGroup. This is needed to simplify the logic in ARROW-17287.
The format and implementation will likely be inspired by the synchronous schedulers, TaskScheduler and TaskGroup but it will remain a separate implementation. In the future, when we dedicate time to improving our synchronous scheduler, we can decide if it makes sense to merge these two types.
/// A utility which keeps tracks of, and schedules, asynchronous tasks /// /// An asynchronous task has a synchronous component and an asynchronous component. /// The synchronous component typically schedules some kind of work on an external /// resource (e.g. the I/O thread pool or some kind of kernel-based asynchronous /// resource like io_uring). The asynchronous part represents the work /// done on that external resource. Executing the synchronous part will be referred /// to as "submitting the task" since this usually includes submitting the asynchronous /// portion to the external thread pool. /// /// By default the scheduler will submit the task (execute the synchronous part) as /// soon as it is added, assuming the underlying thread pool hasn't terminated or the /// scheduler hasn't aborted. In this mode the scheduler is simply acting as /// a task group, keeping track of the ongoing work. /// /// This can be used to provide structured concurrency for asynchronous development. /// A task group created at a high level can be distributed amongst low level components /// which register work to be completed. The high level job can then wait for all work /// to be completed before cleaning up. /// /// A task scheduler must eventually be ended when all tasks have been added. Once the /// scheduler has been ended it is an error to add further tasks. Note, it is not an /// error to add additional tasks after a scheduler has aborted (though these tasks /// will be ignored and never submitted). The scheduler has a futuer which will complete /// once the scheduler has been ended AND all remaining tasks have finished executing. /// Ending a scheduler will NOT cause the scheduler to flush existing tasks. /// /// Task failure (either the synchronous portion or the asynchronous portion) will cause /// the scheduler to enter an aborted state. The first such failure will be reported in /// the final task future. /// /// The scheduler can also be manually aborted. A cancellation status will be reported as /// the final task future. /// /// It is also possible to limit the number of concurrent tasks the scheduler will /// execute. This is done by setting a task limit. The task limit initially assumes all /// tasks are equal but a custom cost can be supplied when scheduling a task (e.g. based /// on the total I/O cost of the task, or the expected RAM utilization of the task) /// /// When the total number of running tasks is limited then scheduler priority may also /// become a consideration. By default the scheduler runs with a FIFO queue but a custom /// task queue can be provided. One could, for example, use a priority queue to control /// the order in which tasks are executed. /// /// It is common to have multiple stages of execution. For example, when scanning, we /// first inspect each fragment (the inspect stage) to figure out the row groups and then /// we scan row groups (the scan stage) to read in the data. This sort of multi-stage /// execution should be represented as two seperate task groups. The first task group can /// then have a custom finish callback which ends the second task group.
Attachments
Issue Links
- links to