Uploaded image for project: 'Apache Arrow'
  1. Apache Arrow
  2. ARROW-17350

[C++] Create a scheduler for asynchronous work

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • None
    • 10.0.0
    • C++

    Description

      Note, in the interest of keeping things simple, this ideally replaces the AsyncTaskGroup. This is needed to simplify the logic in ARROW-17287.

      The format and implementation will likely be inspired by the synchronous schedulers, TaskScheduler and TaskGroup but it will remain a separate implementation. In the future, when we dedicate time to improving our synchronous scheduler, we can decide if it makes sense to merge these two types.

      /// A utility which keeps tracks of, and schedules, asynchronous tasks
      ///
      /// An asynchronous task has a synchronous component and an asynchronous component.
      /// The synchronous component typically schedules some kind of work on an external
      /// resource (e.g. the I/O thread pool or some kind of kernel-based asynchronous
      /// resource like io_uring).  The asynchronous part represents the work
      /// done on that external resource.  Executing the synchronous part will be referred
      /// to as "submitting the task" since this usually includes submitting the asynchronous
      /// portion to the external thread pool.
      ///
      /// By default the scheduler will submit the task (execute the synchronous part) as
      /// soon as it is added, assuming the underlying thread pool hasn't terminated or the
      /// scheduler hasn't aborted.  In this mode the scheduler is simply acting as
      /// a task group, keeping track of the ongoing work.
      ///
      /// This can be used to provide structured concurrency for asynchronous development.
      /// A task group created at a high level can be distributed amongst low level components
      /// which register work to be completed.  The high level job can then wait for all work
      /// to be completed before cleaning up.
      ///
      /// A task scheduler must eventually be ended when all tasks have been added.  Once the
      /// scheduler has been ended it is an error to add further tasks.  Note, it is not an
      /// error to add additional tasks after a scheduler has aborted (though these tasks
      /// will be ignored and never submitted).  The scheduler has a futuer which will complete
      /// once the scheduler has been ended AND all remaining tasks have finished executing.
      /// Ending a scheduler will NOT cause the scheduler to flush existing tasks.
      ///
      /// Task failure (either the synchronous portion or the asynchronous portion) will cause
      /// the scheduler to enter an aborted state.  The first such failure will be reported in
      /// the final task future.
      ///
      /// The scheduler can also be manually aborted.  A cancellation status will be reported as
      /// the final task future.
      ///
      /// It is also possible to limit the number of concurrent tasks the scheduler will
      /// execute. This is done by setting a task limit.  The task limit initially assumes all
      /// tasks are equal but a custom cost can be supplied when scheduling a task (e.g. based
      /// on the total I/O cost of the task, or the expected RAM utilization of the task)
      ///
      /// When the total number of running tasks is limited then scheduler priority may also
      /// become a consideration.  By default the scheduler runs with a FIFO queue but a custom
      /// task queue can be provided.  One could, for example, use a priority queue to control
      /// the order in which tasks are executed.
      ///
      /// It is common to have multiple stages of execution.  For example, when scanning, we
      /// first inspect each fragment (the inspect stage) to figure out the row groups and then
      /// we scan row groups (the scan stage) to read in the data.  This sort of multi-stage
      /// execution should be represented as two seperate task groups.  The first task group can
      /// then have a custom finish callback which ends the second task group.
      

      Attachments

        Issue Links

          Activity

            People

              Unassigned Unassigned
              westonpace Weston Pace
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Time Tracking

                  Estimated:
                  Original Estimate - Not Specified
                  Not Specified
                  Remaining:
                  Remaining Estimate - 0h
                  0h
                  Logged:
                  Time Spent - 3h
                  3h