[ARROW-17350] [C++] Create a scheduler for asynchronous work - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Resolved
Priority: Major
Resolution: Fixed
Affects Version/s: None
Fix Version/s: 10.0.0
Component/s: C++
Labels:
- pull-request-available

External issue URL:
https://github.com/apache/arrow/issues/32624

Description

Note, in the interest of keeping things simple, this ideally replaces the AsyncTaskGroup. This is needed to simplify the logic in ~~ARROW-17287~~.

The format and implementation will likely be inspired by the synchronous schedulers, TaskScheduler and TaskGroup but it will remain a separate implementation. In the future, when we dedicate time to improving our synchronous scheduler, we can decide if it makes sense to merge these two types.

/// A utility which keeps tracks of, and schedules, asynchronous tasks
///
/// An asynchronous task has a synchronous component and an asynchronous component.
/// The synchronous component typically schedules some kind of work on an external
/// resource (e.g. the I/O thread pool or some kind of kernel-based asynchronous
/// resource like io_uring). The asynchronous part represents the work
/// done on that external resource. Executing the synchronous part will be referred
/// to as "submitting the task" since this usually includes submitting the asynchronous
/// portion to the external thread pool.
///
/// By default the scheduler will submit the task (execute the synchronous part) as
/// soon as it is added, assuming the underlying thread pool hasn't terminated or the
/// scheduler hasn't aborted. In this mode the scheduler is simply acting as
/// a task group, keeping track of the ongoing work.
///
/// This can be used to provide structured concurrency for asynchronous development.
/// A task group created at a high level can be distributed amongst low level components
/// which register work to be completed. The high level job can then wait for all work
/// to be completed before cleaning up.
///
/// A task scheduler must eventually be ended when all tasks have been added. Once the
/// scheduler has been ended it is an error to add further tasks. Note, it is not an
/// error to add additional tasks after a scheduler has aborted (though these tasks
/// will be ignored and never submitted). The scheduler has a futuer which will complete
/// once the scheduler has been ended AND all remaining tasks have finished executing.
/// Ending a scheduler will NOT cause the scheduler to flush existing tasks.
///
/// Task failure (either the synchronous portion or the asynchronous portion) will cause
/// the scheduler to enter an aborted state. The first such failure will be reported in
/// the final task future.
///
/// The scheduler can also be manually aborted. A cancellation status will be reported as
/// the final task future.
///
/// It is also possible to limit the number of concurrent tasks the scheduler will
/// execute. This is done by setting a task limit. The task limit initially assumes all
/// tasks are equal but a custom cost can be supplied when scheduling a task (e.g. based
/// on the total I/O cost of the task, or the expected RAM utilization of the task)
///
/// When the total number of running tasks is limited then scheduler priority may also
/// become a consideration. By default the scheduler runs with a FIFO queue but a custom
/// task queue can be provided. One could, for example, use a priority queue to control
/// the order in which tasks are executed.
///
/// It is common to have multiple stages of execution. For example, when scanning, we
/// first inspect each fragment (the inspect stage) to figure out the row groups and then
/// we scan row groups (the scan stage) to read in the data. This sort of multi-stage
/// execution should be represented as two seperate task groups. The first task group can
/// then have a custom finish callback which ends the second task group.

Attachments

Issue Links

links to

GitHub Pull Request #13912

Activity

People

Assignee:: Unassigned

Reporter:: Weston Pace

Votes:: 0 Vote for this issue

Watchers:: 2 Start watching this issue

Dates

Created:: 09/Aug/22 00:38

Updated:: 11/Jan/23 11:50

Resolved:: 02/Sep/22 00:33

Time Tracking

Estimated:

Not Specified

Remaining:

Logged: